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ABSTRACT 

We address the inverse problem of cosmic large-scale structure reconstruction from a 
Bayesian perspective. For a linear data model, a number of known and novel reconstruction 
schemes, which differ in terms of the underlying signal prior, data likelihood, and numerical 
inverse extra-regularization schemes are derived and classified. The Bayesian methodology 
presented in this paper tries to unify and extend the following methods: Wiener-filtering, 
Tikhonov regularization. Ridge regression. Maximum Entropy, and inverse regularization 
techniques. The inverse techniques considered here are the asymptotic regularization, the 
Jacobi, Steepest Descent, Newton-Raphson, Landweber-Fridman, and both linear and non- 
linear Krylov methods based on Fletcher-Reeves, Polak-Ribiere, and Hestenes-Stiefel Conju- 
gate Gradients. The structures of the up-to-date highest-performing algorithms are presented, 
based on an operator scheme, which permits one to exploit the power of fast Fourier trans- 
forms. Using such an implementation of the generalized Wiener-filter in the novel ARGO- 
software package, the different numerical schemes are benchmarked with 1-, 2-, and 3- 
dimensional problems including structured white and Poissonian noise, data windowing and 
blurring effects. A novel numerical Krylov scheme is shown to be superior in terms of perfor- 
mance and fidelity. These fast inverse methods ultimately will enable the application of sam- 
pling techniques to explore complex joint posterior distributions. We outline how the space of 
the dark-matter density field, the peculiar velocity field, and the power spectrum can jointly 
be investigated by a Gibbs-sampling process. Such a method can be applied for the redshift 
distortions correction of the observed galaxies and for time-reversal reconstructions of the 
initial density field. 

Key words: large-scale structure of Universe - galaxies: distances and redshifts - methods: 
data analysis - methods: statistical - methods: numerical - techniques: image processing 



1 INTRODUCTION 

According to our current picture of cosmogenesis, the galaxies, 
galaxy clusters, galaxy filaments, and giant voids forming the cos- 
mic large-scale structure (LSS) are products of gravitational insta- 
bility, which pulls increasingly more matter onto the tiny primor- 
dial seed density fluctuations generated at the very first epoch of 
inflation. The shape and size of the cosmic matter distribution re- 
flects the initial conditions set during or shortly after Big Bang, as 
well as the interplay of the gravitational self-attraction of matter 
and the diluting action of the Hubble expansion of cosmic space. 
Valuable information about the properties and the origin of the cos- 
mic inventory are encoded in the LSS, however, on small-scales, 
that information is being erased through dynamical non-linear pro- 
cesses. 

Our goal is to extract as much of this information as possible 
from astronomical measurements, which introduce uncertainties 



and, consequently, degeneracies. Therefore, we have to adapt an 
information-theoretical approach to solve the reconstruction prob- 
lem of cosmography. The Bayesian framework turns out to be the 
most general approach as we will discuss later. In this paper we 
present the novel ARGC0-software package, which reconstructs the 
three-dimensional density field from the information provided by 
galaxy surveys with different Bayesian and inverse methods. Here 
we focus our study on understanding the Bayesian theoretical back- 
ground and the required algorithmic aspects. Further extensions of 
the code in which the power-spectrum and the peculiar velocities 
can be jointly sampled are presented and tested on mock galaxy 
catalogues. Some of the preliminary results are presented and fu- 
ture development is outlined. 

The large number of telescopes performing galaxy surveys 
with increasing depth, sky coverage, and accuracy in position and 
distance (or redshift) determination provide us with superb data on 
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the cosmic matter distribution at an exponentially increasing rate. 
One problem is that the discrete objects these instruments reveal 
to us, the galaxies, are the result of a complex non-linear evolu- 
tion of cosmic matter combined with complicated astrophysical 
processes such as star formation. A translation of the galaxy data 
into the much better understood large-scale dark matter (DM) dis- 
tribution, which would be much easier to analyze for imprints of 
cosmologically interesting effects, is far from trivial. The discrete 
nature of galaxies introduces certain noise, usually modeled by shot 
noise. Moreover, the partially understood galaxy-formation pro- 
cess inserts systematic uncertainties. In addition, the limited vol- 
ume of surveys adds complications beyond the problems of galaxy- 
distance determination being contaminated by observational and 
velocity redshift-distortions. All these complications have to be 
dealt with simultaneously and in a controlled fashion. Since it can- 
not be assumed that the correct or optimal values for the various de- 
grees of freedom of the problem (bias factors, redshift-corrections, 
etc.) will be guessed a priory, repeated and iterative data analysis is 
mandatory in order to achieve a high-fidelity and well-understood 
cosmic map. For example, a correction of redshift-distortions of the 
galaxies requires the gravitational potential generated by the matter 
distribution to be reconstructed. 

Repeated generation of cosmic matter maps increases the urge 
to face another challenge, the scaling of the performance of the 
underlying map-generation algorithms with the data size. Since 
the matter-density information displayed at a location on a map 
may depend on all input data (galaxy positions), any algorithm op- 
timized to information theory scales super-linea|3. With increas- 
ing survey sizes, increasing requirements for spatial resolution 
and volume coverage, and the need to frequently re-iterate the 
map-generation step, the algorithm has to scale closely to linear 
with data size, otherwise its application is strongly limited. For- 
mer applications in cosmography suffered from such inconvenient 
performance-scaling, and an effort has to be made to develop si- 
multaneously high-performance and accurate methods. 

The work presented in this paper developes the general 
methodology of Bayesian reconstruction of the cosmic matter dis- 
tribution, based on the invaluable pioneering work of many other 
scientists, which will be discussed below, and extends this work to 
a series of new applications. Existing and novel map making algo- 
rithms are summarized in terms of a classification of their Bayesian 
likelihood and prior functions. The implementation, optimization, 
and comparison of various numerical schemes are addressed in 
detail. This provides a starting point for a correct information- 
theory approach to cosmography. Many additional problems, not 
addressed in this paper, such as the galaxy bias, will also have to be 
solved before accurate maps of the dark matter distribution in our 
still mysterious Universe can be generated. 

Such an undertaking would be highly rewarded in the short 
and long run. An accurate map of the cosmic matter distribution 
would be valuable for a manifold of direct scientific applications. 
These range from structure-formation analysis, to cosmological pa- 
rameter estimation via power- spectrum measurements, dark energy 
studies, galaxy-cluster identification and galaxy-bias studies. Accu- 
rate cosmic maps would help to determine weak signals associated 
with the large-scale structure such as the integrated Sachs-Wolf 
(ISW) effect, or the extended Sunyaev-Zel'dovich (SZ) effect, the 



^ A map of galaxy counts can be generated by an algorithm with linear 
scaling to data size however, it is not an optimal representation of the un- 
derlying matter field. 




Figure 1. The hierarchical Bayes model for a galaxy distribution in red- 
shift space (5| is represented here in a directed acychc graph (DAG). The 
cosmological parameters p^^sm govern the rest of the variables. The ini- 
tial density field coming from e.g. inflationary scenarios can be statistically 
described by all its moments (^Qjyj). Here the power spectrum is usually 
taken, since the intitial perturbations are well described by a Gaussian real- 
ization of the initial seed fluctuations. The further evolution is described by 
nearly deterministic processes (given by structure and galaxy formation), 
which determine the later- time dark matter distribution <5dm with its pe- 
cuhar velocity field v and the bias function b that relates the galaxy distri- 
bution to the dark matter density field. The dark matter distribution (5dm 
with the bias produces the galaxy distribution in real space (5g . The pecuUar 
velocities v related to the density field through the continuity equation in- 
troduce the redshift distortion in (5g finally leading to the galaxy distribution 
in redshift space (5| . 

detection of which relies on the construction of optimal statistical 
filters for these signals. 

Finally, one could argue that mapping the distribution of mat- 
ter in the Universe represents a response to mankind's curiosity in 
its aim to discover terra incognita and find an orientation in space 
and time on cosmological scales and, therefore, should be a goal in 
itself. 

In the remainder of this introduction we give the sources of 
uncertainties, we present an overview of existent and new Bayesian 
reconstruction methods, subsequently we briefly describe the algo- 
rithmic development presented in this paper, and in the final part 
we give a more detailed overview of the structure of this paper. 

1.1 Classes of uncertainty 

Several classes of uncertainties related to the density-field recon- 
struction from galaxy surveys demand a statistical approach. Some 
of the uncertainties are intrinsic to the nature of the underlying sig- 
nal (the dark matter). Other uncertainties are intrinsic to the nature 
of the observable (the galaxies). And finally there are uncertainties 
due to degeneracies which appear through the observation and data 
mining process. 

(i) Intrinsic stochastic character: cosmic variance 

In cosmology it is generally assumed that the structure of the 
Universe comes from some infinitesimal quantum fluctuations 
which were frozen out and st r etched by an inflat i onary phase (see 
lGuthlll98lh [Outh & PillTgsl : IStarobinskvlll982l : lHawking||l982l: 
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lLindJl982l : P^lbrecht & Steinhardlll982l : lBardeen et al.ll983h . and 
later amplified by gravitational instability. According to this pic- 
ture, the seed fluctuations would have an intrinsic stochastic charac- 
ter and are mainly Gaussian distributed. However, the mechanisms 
that stretch the quantum fluctuations may also introduce deviations 
from Gaussianity which would then be imprinted in the seed fluctu- 
ations. In general all the moments of the initial fluctuations have to 
be considered {5dm ) • Nevertheless, most of the inflationary scenar- 
ios predict the density field to be very closely Gaussian distributed 
and it is generally sufficient to take the second order moment, the 
two-point correlation function, or the power-spectrum in Fourier- 
space. We will discuss below how to determine the power- spectrum 
and techniques to disentangle intrinsic non-Gaussianities within a 
Bayesian framework. Note, that there are alternative models to in- 
flation in which e.g. the seed fluctuations are identified with the 
topolog ical defects th at remain as relics of high-energy phase tran- 
sitions ( lKibblelll97d) . Accurate reconstructions of the LSS could 
help to discriminate between the different models. 

(ii) Physical uncertainties: galaxy bias 

The galaxy formation process is a complicated, non-linear and 
(probably) non-local process. It is known that on large scales the 
galaxy power-spectrum fits well to the expected DM spectrum pre- 
dicted from cosmic microwave background (CMB) observations, if 
some bias factor b between the amplitude of the galaxy and DM 
fluctuations is assumed. Detailed studies show that the bias fac- 
tor is not universal, but depends on galaxy type, galax y forma- 
tion time, redshift, etc. (see e.g. ICoorav & Shetlill2002l and ref- 
erences therein). For the purpose of reconstructing the underlying 
density field, linear biases can easily be tackled within the linear 
data model described below by including its effects in a selection 
function. Nevertheless, more complex biases have to be further in- 
vestigated in a Bayesian framework. Physical processes, which are 
not perfectly understood within galaxy formation may be treated 
in a statistical way, encoding the ignorance about certain physical 
processes in probability distribution functions. Several works study 
the stochastic non-linear galaxy biasing (see for ex ample lPenI 19981 : 
lDekel&Lahavlll999l: iTegmark & Bromlevl[l99 ?). Some of these 
models could be implemented in the Bayesian reconstruction pro- 
cess. This issue is out of scope in this paper, but should be further 
investigated in this framework. 

(iii) Physical/observational uncertainties: redshift- 
distortions 

The peculiar motion of galaxies with respect to the Hubble 
flow of the Universe: v, introduces uncertainties in the ir redshift 
meas urement, the so-called redshift-distortions (see e.g. lHamiltonI 
1 19981 for an introduction to this problem). The measured galaxy 
over-densities are thus said not to be in real-space 5^, but in 
redshift-space 5|. In the linear regime, where galaxies fall into the 
potential wells of large scale structures, redshift-distortions cause 
a squashing of the linear over-densities in radial direction. How- 
ever, in the non-linear regime, galaxies (e.g. in a galaxy cluster) 
tend to behave like particles in a gas with randomized motions 
inside the clusters where the potentials are very high. This pro- 
duces the so-called finger-of-god effect, a dispersion along the line 
of sight. The correction of these distortions is not trivial, since 
the process of structure formation partially erases the information 
about the initial fluctuations after entering the non-linear regime. 
Consequently, determining the real position of galaxies poses a 
degenerate problem, which has in general many possible solu- 
tions. Many efforts have been made to correct for these distortions: 
in the lin ear regime th ese efforts start with Kaiser's pioneering 
work (see lKaiseJll987h and are followed by the linear redshift- 



distortions operator (for a detailed derivation see lHamilton|[l998h . 
In the non-linear regime, these efforts include a velocity disper- 
sion factor (the dispersion-model) corresponding to an exponen- 
tial pair wise velocity distribution function with n o mean stream- 
ing (see lBallinger et aOl996h . [Scoccimarrol j2004) presents an ex- 
act relationship between real-space and redshift-space two-point 
statistics through the pairwise velocity distribution function includ- 
ing all non-linearities. More complex methods of correctin g for 
redshift-distortions were classified bv lSchmoldtetalJ ( ll999l) into 
iterative methods, which uses the redshift-space density to calcu- 
late a peculiar velocity field, and then i t eratively corrects t he den - 
sity field distortions cYahil et al.iri99ll : iKaiser & Stebbinsi il99lh . 
The other class decomposes the redshift-space density in radial and 
angular basis functions from whi ch the radi al redshift-disto rtion is 
corrected (see e.g . Lahav 1994; Nusser & D avis 1994; Fish er et al.l 
ll995l ; ISchmoldt et al.ll999l) and more recently P ercivall ( 1200^ . Be- 
low, we propose a Bayesian method to correct for the linear and 
non-linear redshift-distortions in a statistical way (see section [23t . 

(iv) Observational uncertainties: measurements 

The action of measurement introduces uncertainties, either due 
to the instruments, e.g. blurring by the telescope, or due to the ob- 
servational strategy, which is included in the noise term, the se- 
lection function, and the mask effects (see lZaroubi et al.ll 19951 for 
a pioneering work in the LSS field). Ignoring selection functions, 
windowing, or blurring will lead to strongly biased reconstructions, 
which are far from the real signal, and thus allow only very limited 
interpretation of the true physical picture. A numerical implementa- 
tion of these effects is presented in section (sec:operators). The in- 
fluence of these effects will then be analyzed separately and tested 
with our code. The results are presented in section Though 
ARGO demonstrates its capability to handle these uncertainties, fur- 
ther work is required in order to apply it to real data. Particular ex- 
pressions for the selection function according to the redshift survey 
under study, as well, as masks, etc., have to be implemented. 

(v) Mathematical/numerical representation uncertainties: 
aliasing effects 

Some uncertainties are not intrinsic to the observable, but orig- 
inate from the mathematical representation one chooses. Treating 
galaxies as counts in cells or with other mass-assignment schemes 
will smear out the information about their measured position for 
which one has to correct (see section U.3.2b in order to derive other 
quantities, like the power-spectrum (see section [2.6.2b . 

From all the points mentioned above we conclude, that extracting 
the underlying dark matter density field from the luminous mat- 
ter distribution given by galaxy redshift surveys poses a classical 
signal reconstruction problem. A Bayesian network depicting the 
relation of these uncertainties is shown in fig. l[T). 



1.2 Bayesian reconstruction mettiods 

Any Bayesian statistical approach requires the definition of a like- 
lihood and a prior. The former is the probability distribution func- 
tion describing the process generating the observational data. It can 
be interpreted as a distance measure of the observed data to the 
underlying signal, as we will discuss below. The prior stands for 
the distribution function modeling our prior knowledge on the sig- 
nal to be recovered. Mathematically it can be shown that it regu- 
larizes the estimator in the presence of noise (see section 12.5. U . 
Two kinds of priors have to be distinguished, informative priors, in 
which the previous physical knowledge about the signal is encoded, 
and non-informative priors, which try to give objective estimators 
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for the underlying signal based on purely information-theoretical 
arguments. Here, three non-informative priors are considered: flat 
priors (see section l2.5.5t with a constant probability distribution 
function (PDF), entropic priors based on Shannon's notion of in- 
formation (see section |2.5.9l l, and Jeffrey's prior based on invariant 
statistical structures under transformation of variables (see section 
12.5. 8t . Finally, a maximization or sampling of the posterior distri- 
bution, which is proportional to the product of the likelihood and 
the prior, has to be done to complete the Bayesian estimation. The 
maximization of the posterior is called the maximum a posteriori 
method (MAP). The maximum likelihood (ML) and maximum en- 
tropy method (MEM) are particular cases of the MAP with flat pri- 
ors and entropic priors, respectively. Complex posterior distribution 
functions may be sampled iteratively from conditional PDFs in a 
Markov Chain Monte Carlo fashion (MCMC), see section IZ61 We 
show how different choices for these distribution functions together 
with the estimation procedure lead to different reconstruction al- 
gorithms, which consequently have distinct application fields (see 
table[T]l. A review of existing methods is presented and new appli- 
cations for the large-scale structure reconstruction, which naturally 
emerge within the Bayesian formalism, are developed. 

In this work we consider Poissonian and Gaussian likelihoods 
for the galaxy distribution. The former has been previously con- 
sidered in image restoration especially for deconvolution purposes 
(see Richardson 1972 ; Lucy 1974) . For example, the Richardson- 
Lucy algo rithm can be derived as the ML of a Poissonian likeli- 
hood (see IShepp&Vardill 19821 and appendix |B- Here an image 
can be regarded as photon counts in cells represented by a Poisso- 
nian distribution. However, one should notice that this likelihood 
does not represent the galaxy-formation process. From a pure im- 
age reconstruction perspective, it can still be interesting for LSS 
estimations, because it naturally represents the discrete nature of 
a galaxy distribution. The Gaussian likelihood allows the incor- 
poration of arbitrary noise structures through the variance. The 
CMB map-making algorithms, which aim to convert time-ordered 
data received from satellites into a map of the CMB signal on the 
sky as a projection on the sphere, usually use this likelihood. In 
this case, the ML leads to the simple COBE-filter first derived 
by jjanssen & Gulkid jl992l) . Nevertheless, the complex scanning 
strategies and foreground re moval can add unlimited complex- 
ity to these algorithms (e.g. iNatoli et al. I I2OOII: lDoreetalJl200ll : 



IStompor et alj2002l : lKeihanen et alj2'oOa : lYvon & Mavetll2005l) . 

For the LSS the Gaussian prior arises as the natural in- 
formative prior due to the arguments discussed above. We pro- 
pose a novel algorithm: GAPMAP, which maximizes the poste- 
rior with a Gaussian prior and a Poissonian likelihood (see sec- 
tion 12.5.41 and appendix 0. In contrast, the Gaussian likelihood 
with the Gaussian prior leads to the well-known Wiener-filter, 
which has been used for the LSS reconstruction (seelFisher et al. 



199 4 l:'Hoffman'l994':'Lahav et al.'l994":'Lahav'l994VZar oubi et al . 
1995: Fisher et al. 1995; Webster et al. 1997; Zaroubi et al .1 1 19991 ; 
Schmoldt etal.lll999 ; Erdogdu et al. 2004, 2006) and for CMB 



mapping (see e.g. iBunn et al. .1994 ; Tegmark .19971) . It is also 
known to give optimal results in ter ms of yielding the l east s quare 
error, see the pioneering work of iRvbicki & PressI ( Il992h and 
IZaroubi et al. I (l995). We present in this paper a fast Wiener-filter 
extra-regularized with Krylov methods as we will see below (see 
tableHfor a summary of different Krylov methods). 

Intrinsic primordial non-Gaussianities can be imprinted in the 
seed fluctuations depending on the particular theory responsible for 
the amplification of the fluctuations coming from the early Uni- 
verse. To find such deviations, non-informative priors, which give 



non-linear estimates for the underlying signal are required. En- 
tropic priors are well suited here, and have been previously applied 
for CMB studies. We extend this work for LSS reconstructions and 
develop the corresponding maximum entropy method for Gaussian 
and Poissonian likelihoods (see section [2.5.9| and appendix|Il(. 

Sampling methods have the advantage of determining the 
shape of distributions and, thus, leading to a natural estimate of 
the uncertainty of the estimator. Moreover, the mean can be calcu- 
lated easily from the sample and is known to give more accurate 
results than t he maximum in the case of asymmetric PDFs (see e.g. 
lTanneilll996h . 

As an example, iHobson & McLachlad l [2003h proposed a SZ- 
cluster detection algorithm using the Metropolis-Hasting algorithm 
method based on a Poissonian pri or distribution, which is designed 
to find discrete objects. Recentlv ISutton & Wan delt ( 2006) devel- 
oped a reconstruction method for radio-astronomy that samples 
from the multiplicity function (see eg. 134b. Alternative approaches 
to the maximum likelihood for CMB-mapping algorithms try to 
jointly reconstruct the CMB-map with its power-spectru m using 
Gibbs- sampling t echniques (Wandelt et al. 2004; O'Dwv er et al.l 
I2OO4I ; lEriksen et al.ll2007l) . This approach is especially efficient 
with respect to other MCMC methods because the transition prob- 
ability matrix moves the system in each step of the ch ain. For thi s 
special case the importance ratio is always one (seee.g. lNealll993h . 
This MCMC method requires, however, the complete knowledge 
of the full conditional PDFs in order to sample from them. Note, 
that the Gaussian prior for the signal simultaneously represents the 
likelihood for the power-spectrum given the signal, which in this 
case is an inverse Gamma function for the power-spectrum (see sec- 
tion |2.6.2T (. This distribution naturally samples the power-spectrum, 
which strongly deviates from Gaussianity. 

With the aim of estimating the power-spectrum in an objective 
way, non-informative priors are used. Usually a flat prior is taken 
for the power-spectrum. Alternatively, Jeffrey's prior, for which we 
give a derivation based on Fisher information (see appendixU]), can 
be used. Alternatively, an entropic prior could also be taken. 

Other attempts have been made to estimate the power- 
spectrum from the LSS based on the distribution of galaxies. A 
modified Gaussian PD F with a log-n ormal mean has been used 
in this approach (see IPercivall |2005| ). The same kind of con- 
cept, using a modified Gaussian distribution to sample deviations 
from Gauss ianity, has been applied to SZ-cluster detection by 
IPierpaoli &" Anthoine (2005). 

In this paper we propose to apply a Gibbs-sampling algorithm 
to jointly sample the underlying three-dimensional density field 
with the power-spectrum and the peculiar velocities, which can be 
used to correct for the redshift-distortions. Note, that the peculiar 
velocities can also be used to trace the initial density fluctuations 
back in time as we will discuss below. 



1.3 Algorithmic development 

In this paper we focus our work on the numerical optimization of 
inverse techniques to show that a joint estimation of the LSS matter 
density field and its parameters is feasible (see sections[3]&|4ll. 

The calculation of the reconstructions, either through maxi- 
mization or through sampling, requires the inversion of certain ma- 
trices. For the Wiener-filter, for instance, the reconstruction prob- 
lem consists in one of its steps on the inversion of the correlation 
matrix of the data. The methods used in this field so far calculated 
this matrix and inverted it mainly using the Singular Value Decom- 
position algorithm that scales as 0{n'^) for a.nxn matrix (see e.g. 
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IZaroubietal]|l995h . However, this approach seems to be hopeless 
in light of the overwhelming amounts of data coming from differ- 
ent surveys and the possibility of combining them. We made spe- 
cial effort to implement an algorithm in which the involved matri- 
ces would not need to be stored taking advantage of an operator 
formalism, which we worked out here for different reconstruction 
methods (see table[3]and section [331 >. Such a formalism also allows 
fast iterative numerical methods that speed the inverse step up to a 
scaling of Oinlog^ n) thus reducing the main operations to fast 
Fourier transforms (FFTs). Some of these numerical schemes have 
been used in CMB-mapping algorithms, but were lacking a detailed 
comparison of the efficiency of the different methods. Such a com- 
parison is presented here. We derive the different inverse methods 
in a unified way starting with a Bayesian motivation for iterative 
schemes (see appendixO and following with a general formulation 
of the asymptotic regularization from which the Jacobi, the Steep- 
est Descent, and the Krylov methods are derived. Moreover, non- 
linear inverse methods are discussed, like the Newton-Raphson, the 
Lanweber-Fridman and the non-linear Krylov methods. Precondi- 
tioning (see appendix ImJ was taken into account in all the deriva- 
tions and the importance of such a treatment is tested in section 
In addition, a previously not discussed Krylov method is de- 
rived (see formula riOOl section[3]and appendix[Kl and its superior 
efficiency is demonstrated (see section|4ll. 



1.4 Structure of the paper 

This paper is structured as follows: in section (|2} we state the 
problem of signal reconstruction, then we define the data model. 
Subsequently, we introduce a general statistical perspective within 
a Bayesian framework from which different solutions to the re- 
construction problem are presented, including Wiener-filtering, the 
COBE-filter, a novel GAPMAP algorithm with a Poissonian like- 
lihood and a Gaussian prior, Jeffrey's prior and the Maximum 
Entropy method (MEM). Markov Chain Monte Carlo methods 
(MCMC) that sample the global probability distribution function of 
the signal and all underlying parameters are presented as the ideal 
approach to achieve a full Bayesian solution of the reconstruction 
problem. In the numerical method section (jSj, different iterative 
inverse schemes which have been implemented in ARGO are pre- 
sented, including a very efficient novel scheme. The operator for- 
malism is worked out for four novel algorithms in large-scale struc- 
ture reconstruction. The efficiency of the different inverse schemes 
is tested with the Wiener-filter under different reconstruction cases 
with synthetic data, including structured noise, blurring, selection 
function effects, and windowing in section Particular detailed 
derivations are presented in the appendix. 



to be modeled in a first step. The reconstruction problem is classi- 
cally seen as the inverse of this functional dependence. The solu- 
tion to this problem is far from being trivial and essential issues, 
like solution existence, solution uniqueness, and instability of the 
solving process, have to be considered. Regarding the solution ex- 
istence, there will be no model that exactly fits the data, since the 
mathematical model of the physics of the system is approximate 
and the data contain noise. That forces us to look for optimal so- 
lutions, rather than exact solutions. We will have to deal especially 
with the last two points mentioned above, uniqueness and stabil- 
ity, because an infinite set of possible solutions can fit the data and 
because of the ill-conditioned character of the system we are treat- 
ing. A regularization method that stabilizes the inverse process by 
imposing additional constraints will be required. We show below 
how the Bayesian framework permits us to do a regularization in a 
natural way and furthermore to jointly estimate the signal and its 
parameters. The calculation of the Bayesian estimators will require 
extra-regularization techniques, which will be presented in section 
([3}. We will start posing the inverse problem by defining the model 
of the data. 



2.1 Data model 

The galaxy formation process is known to be a complicated, non- 
linear and probably non-local process, as mentioned in the intro- 
duction. Thus, attempts to invert the galaxy distribution into the 
original DM distribution suppose a great challenge. It is known 
that, given some bias factor between the amplitude of the galaxy 
and the DM fluctuations, the galaxy power-spectrum on large scales 
fits well to the expected DM spectrum predicted from CMB obser- 
vations. Detailed studies reveal that the bias factor is not univer- 
sal, but depends on galaxy type, galaxy formation time, redshift, 
etc. The data model connecting the signal (DM distribution) to our 
observable (galaxy counts) is in consequence complex, non-linear 
and non-local. The main goal of this paper is to develop a Bayesian 
framework that permits one to split the dependencies into separated 
problems, which can then be jointly tackled with physical and sta- 
tistical techniques. In principle, also the bias of the galaxies can be 
sampled (see discussion in the introduction). However, this is out 
of the scope of this paper 

Here we present a linear data model which can easily be ex- 
tended to a simple non-linear data model by a non-linear weighting 
scheme (e.g. by weighting the galaxies according to their apparent 
luminosity). Nevertheless, many of the uncertainties we are facing, 
such as the convolution effects due to the blurring of a telescope, 
the pixelization scheme, the mask effects due to the observation 
strategy, or the selection effects due to the limited sensitivity of the 
detectors, can be described with a linear model. This linear model 
will contain non-linear information in the noise term. 



2 BAYESIAN APPROACH TO SIGNAL 
RECONSTRUCTION 

The reconstruction of a signal (here: DM distribution) given a set 
of measurements (here: galaxy catalogues) is usually a highly de- 
generate problem, as we have discussed above, where the signal 
is under-sampled and modified by systematic and intrinsic errors 
due to the nature of the observable. This is indeed the situation that 
we are facing, since most of the galaxy redshift surveys have par- 
tial sky coverage and the discrete nature of galaxies introduces shot 
noise. 

An expression for the data as a function of the real signal has 



2.1.1 Linear data model 

The general linear reconstruction problem formally can be written 
as the inverse problem of recovering the signal s from the observa- 
tions d related in the following way 

d{x) = J dyR{x,y)s'{y), (1) 

where R represents the kernel of the Fredholm integral equation 
of the first kind defined by (TJ, with noise on the signal s being 
expressed by the superscript e. Discretizing eq. (TJ and assuming 
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additive noise, we can formulate the signal degradation model by 

d = Rs + e. (2) 

where the m x 1 vector d represents the data points resulting from 
the measurements (here: galaxy counts), the statistical noise and 
the underlying signal are a m x 1 vector e, and a n x 1 vector s 
respectively. The object that operates on the signal is R a m x n 
matrix which commonly describes blurring effects caused by the 
atmosphere, the point-spread function (PSF) of the telescope or the 
response function of the detectors of the instrument. 

Let us denote the physical observation process encoded in the 
R-matrix as Rp . We are interested in the selection function of the 
survey /s with the corresponding masks /m, which can also be in- 
cluded in R. One has to be careful with the data model defined in 
eq. |2] As several authors point out, there is a correlation between 
the underlying signal s and the level of sh ot noise pr oduced by the 
discrete distribution of galaxies (see e.g. ISeliak|[l99a) . Since, by 
definition, additive noise assumes no correlation with the signal - 
otherwise we would have signal content in the noise - we define the 
effective noise e as the product of a structure function /gp, which 
could be correlated with the signal, with a random noise component 
(en) that is uncorrelated with the signal. Given the above defini- 
tions, the effective noise e is uncorrelated with the signal. We may 
then rewrite eq. ([2j in continuous representation as 

d{x) = I AyRp{x,y)fs{y)fM(y)s(y) + /sp(s(a;))eN(a;), 

(3) 

where R{x,y) = Rp{x,y)fsiy)fMiy) and e{x) = 
fsF{s{x))e]si{x). In practice, we will assume white noise (i.e. con- 
stant noise in Fourier space), en = ewN . However, none of the pre- 
sented techniques in this paper depend on this simplification. Some 
of the previous studies of large-scale structure reconstruction also 
included the inverse of the lin ear redshift-distor tions operator as a 
matrix multiplying R (see e.g. lLahav et al J 19941) . Such an operator 
cannot easily be found for the non-linear regime. Earlier works try 
to correct the non-linear redshift-distortions with an additio nal fac- 
tor in the power-spectrum analogous to Kaiser's factor (see lKaiseJ 
11987 : Ballinger et al., 1996 : ,Erdogdu et al. 2 004). Here, we propose 
a Bayesian solution to the signal reconstruction problem as it will 
be discussed later. 

In most cases, the signal will be strongly under-constrained 
due to under-sampling, i.e. n 2> m, which is nearly unavoidable 
due to partial sky coverage of surveys. The linear equation (eq.[2} 
to be inverted is a rank-deficient system. Such systems are char- 
acterized by non-uniqueness, since the matrix R has a nontrivial 
null space. By superposition, any linear combination of the null 
space models (models so that satisfy Rso = 0) can be added to a 
particular solution leading to infinite solutions. Consequently, we 
cannot discriminate b etween situations where the solution is truly 
zero (see for example [Aster et alll2005l) . As is well known, a direct 
inversion of eq. ^ (R~^d) will am plify the st atistical noise and 
lead to an unstable solution (see e.g. lZaroubi et all 19951) . Instead, 
a regularization method, which often follows several steps, has to 
be applied . The first step consists of finding an expression for an 
estimator of the signal s that approximately satisfies the data model 
(eq.[2ll and copes with the noise. Further regularization methods are 
usually required in a second step to actually calculate the estimator. 
This happens whenever some ill-posed linear or non-linear opera- 
tors have to be inverted. We shall distinguish between noise regu- 
larization and inverse regularization according to the first and the 
second step, respectively. As Zaroubi et al. ( 1995) pointed out, us- 
ing a mean variance estimator alone does not completely solve the 



inverse problem. Therefore, they proposed the singular value de- 
composition algorithm (SVD) to extra-regularize these problems. 
However, this method requires one to calculate the correlation ma- 
trix of the data implying a slow algorithm, scaling as 0{n^), and 
needs large storage facilities. We will show that a Bayesian ap- 
proach is a natural regularizer for the noise, which then can be reg- 
ularized further for the inverse purpose with efficient methods that 
scale as C(n logj n) (see section^. Let us address the problem of 
signal reconstruction from a statistical inference perspective. 



2.2 Inversion via statistical estimator 

In parametric modeling it is assumed that observational data have 
been generated by random processes with probability density dis- 
tributions, depending on the model parameters (see for example 
[Robert 2001). Statistical analysis in this context is essentially an 
inverse method, which aims at retrieving the causes (here reduced 
to the parameters of the probabilistic generating mechanism) from 
the effects (here summarized by the observations). 

Traditionally, one tries to find a way where the available in- 
formation is optimally used and a unique estimator is selected from 
an infinite set of solutions. One of the classical approaches consists 
of minimizing the variance of the residuals, which is the variance 
of the discrepancy between the estimator and the set of possibl e 
realizations consistent with the data (see iRvbicki & Presslll992t) . 
This conjecture is reasonable because the least deviation from the 
set of true signals is searched. The estimator obtained in this way 
is called the least squares quadratic (LSQ) estimator. However, a 
transparent statement of the statistical assumptions is missing in 
this method, contrary to the Bayesian approach used in this work 
as will be shown below. Moreover, Bayesian statistics allows sam- 
pling the PDF of the system under consideration in a natural way. 
Strictly speaking, one does not look for a unique estimator in this 
framework. Nevertheless, a summary of the PDF can be given by 
the mean of the sample (see section |Z6t . 

The most general approach to determine an estimator, how- 
ever, should be based on the global (joint) PDF over all relevant 
quantities, like the signal s and all model parameters p, without ne- 
glecting any possible dependences. Let us assume that P{s,p \ d), 
the joint PDF of the system under consideration, depends on the 
signal s and a series of additional parameters p, given the observa- 
tions d. One solution would then be to calculate the expectation of 
the signal over the joint PDF space 

Ejoint(s) = J dsdp [P(s,p \d)s^= (s)(3_p|£i), (4) 

where we have introduced the ensemble average with 
the subscript representing the PDF over which the integral is done 
P{s,p \ d) ^ {s,p \ Expression Q can consequently be 
read as the ensemble average over all possible signals and parame- 
ters. The joint PDF is unfortunately quite hard to calculate directly, 
and the integral in eq. ^ is computationally too expensive for re- 
alistic cases as it involves many parameters and a large amount 
of data. To disentangle the uncertainties in parameter and signal 



Sometimes, however, the ensemble angles will denote the estimator of 
some signal or parameter in a more general sense, like the maximum like- 
lihood or the maximum a posteriori (see sections l2!4l and [231 respectively). 
Note that a bracket formalism could be introduced at this point, in which 
eq. (4) would be represented in the following way: (s|s|p, d). 
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spaces, let us apply the product rule of statistic|3 to eq. Q 
Ejoint(s) = JdpP{p\d) J ds\p{s\p,d)sj 

= Ep[Es (s I p,d) I d] = ((s>(3|p^^)>(pi^j. (5) 

This means that the expectation of the signal s corresponds to the 
average of the conditio nal mean of s over t he marginal distribution 
of p (see for example iGelman et alj|2004b . where the conditional 
mean is given by 

Econd(s) = Es(s \p,d) = j As \p{s I p,d) s] = (s)^g|p ,^j. 

(6) 

Traditionally, the conditional PDF has been used to determine the 
esti mator of the signal as suming that all the parameters are known 
(e.g. IZaroubi etal]|l995h . 

As the reconstruction step of the density field is computation- 
ally expensive, a joint estimation of the parameters is out of scope. 
Therefore, the reduced approach of basing the estimators on condi- 
tional PDFs provides a computationally more feasible way to tackle 
problems of this kind. In particular, we will demonstrate that an op- 
erator formalism allows efficient sampling of the conditional PDFs, 
enabling us to sample the joint PDF in a Bayesian framework. 



2.3 Bayesian approach 

Given a data model, one can usually find an expression for the sam- 
pling distribution, i.e. the probability of obtaining the data given the 
signal and some additional parameters p, P(d | s,p). This is much 
less difficult than a direct calculation of the posterior P(s | d,p). 
We need an expression which relates both the sampling and the 
posterior distribution given by Bayes theorem. The derivation of 
Bayes theorem is straightforward from the joint PDF of the signal 
and the data, using the product rule and the fact that the joint PDF 
is invariant under permutations of its argument![f]. Bayes theorem 
can be expressed by the following equation 



P(s I d,p,I) 



P{d\s,p,I)P(s\p,I) 

P{d\p,l) 



(7) 



where P{s \ p, I) represents the prior knowledge about the sig- 
nal, as it models the signal before any observations occur. The 
PDF given by P[d \ p, I) stands for the so-called evidence that 
is treated as the normalization of the posterior 



P{d\p,I) = J dsPid\s,p,I)P(s\p,I). 



(8) 



It is worth mentioning that all the probabilities are conditional 
to the underlying physical picture, or prior information /. This 
has to be explicitly considered in case of model comparisons. In 
the following sections, we will present the steps for completing 
a Bayesian analysis, starting with the likelihood, then discussing 
the importance of the prior, and finishing with sampling through 
the joint signal and parameter space. Note that different choices 
for these three components (likelihood, prior, and sampling) lead 



P{s, p\d) = P{s\ p, d)P{p I d) 



to different classes of reconstruction algorithms. An overview of 
the different reconstruction scheme implementations based on this 
classification can be found in table ([TJ. 



2.4 The likelihood 

The likelihood function is formally any f unction of the parameters 
9 proportional to the sample density (see lTanne3ll996h 



£(0 I d) (X P{d I 9) 



(9) 



Many inference approaches are based on the likelihood function, 
justified by the likelihood principle, which states that the informa- 
tion obtained by an observation d about 9 is entirely contained 
in the likelihood function £{9 \ d). To be specific, if di and d2 
are two observations depending on the same parameter 6 such that 
there exists a constant c satisfying Ci{9 \ di) — cL-^iQ \ ^2) for 
every d\ and d^ then bring the same information about Q and 
must hence lead to identical inferences (see Robert 20C)lj). 

Maximum likelihood (ML) methods, for example, rely on the 
likelihood principle with an estimator of the parameters given by 



(0)ml = argsupg L(9 \ d), 



(10) 



i.e., the value of 9 that maximizes the probability density at d. 
Bayesian methods take also advantage of the likelihood principle 
incorporating the decision-related requirement of the inferential 
problem through the definition of a prior distribution (see section 
I2.5t . The definition of the likelihood is the first step in a Bayesian 
framework to determine the posterior distribution (see eq. |7). In 
using galaxy redshift surveys to trace the matter distribution, we 
have to deal with the discrete nature of the data sample. Thus the 
likelihood may be derived here for Poissonian statistics. 



2.4.1 Poissonian likelihood 

The likelihood of our galaxy distribution may be approximately 
represented by a Poissonian distribution (the real statistics should 
describe the much more complex galaxy formation process). Un- 
der the assumption of independent and identically distributed {iid) 
observations, this yields 



C{s I d,p) oc 

m 

p{d\s,p) = n™p( 



(11) 



{Rs')^ + CA 



[(Rs'), + C,]" 
d'\ 



where d'^ are the galaxy counts per cell i and the real, positive sig- 
nal of the expectation value of the number of galaxies is given by 
s[ = ^(l + bsi), with Si = 5 pi = '''■I'' the DM over-density, 
our target signal. The quantity rig stands for the mean number of 
galaxies, p represents the mean density and b the bias factor. All 
these quantities are redshift-dependent. The additional parameters 
p in this case would be represented by some background a and 

would enter into the operator R that modifies the sig nal s. 

For a similar applica tion in astronomy see iLahav & Gulll 

l ll989h and lRobinsonI Jl99lh . If d'i is not an integer, e.g. due to some 
interpolation process, a Gamma function may be used instead of the 
factorial, d-! ^ T{d', + 1). 



P(s, d, p,I) 
P(d,s,p,I) 



P{s I d,p,I)P(d\p, I) : 
P(d I s,p,I)P{s I p,I) 



2.4.2 Gaussian likelihood 

When the number of counts is large the Poisson distribution can 
be approximated by the normal distribution. In that case, the likeli- 
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hood can be given by a Gaussian distributed noise 
C{s I d,p) oc 



P{d\s,p) = 



1 



[(27r)™det(N)]i/2*'''P 



oc exp 



1 2^ ^ 



(12) 



wliere N = (ee^){g|p) is thie covariance matrix of thie noise 
e = d — Rs, and 



X 



\s) = (d-Rs)'''N"^(d-Rs). 



(13) 



The parameters p determine the structure of the noise e (and there- 
fore the structure of the covariance matrix N), and also enter into 
the operator R. We give different expressions for the noise covari- 
ance matrix N in section J3.3I ). 

Note that coincides with the square of the Mahalanobis 
distanc^l between d and Rs, and also coincides with the squared 
N~^-norm of the error 



(14) 



In this case, the ML will correspond to the least squares of the error. 
It will minimize the x^{s) and hence minimize the Mahalanobis 
distance between the data and the noise-free data model. Therefore, 
the ML is equivalent to searching the estimator that fits the data 
better without constraining the model for the signal. Let us study 
the prior that precisely sets constraints on the signal s. 



2.5 The prior 

A second step in Bayesian analysis is to specify the prior distribu- 
tion for the signal, which contains the prior knowledge about the 
signal before the measurements were carried out. For little infor- 
mative data it can strongly affect the posterior distribution and thus 
modify any inference based on it. For this reason, frequentists crit- 
icize Bayesian methods as being subjective. Other definitions of 
probability, like the frequentist, however, can be shown in most of 
the situations to be particular cases of the Bayesian approach (see 
e.g. 'Tanner 1996), implying the use of an implicit prior. The advan- 
tage of defining the prior knowledge about the system under con- 
sideration is that the interpretation of the results is straightforward, 
especially because assumptions flowing into the inference proce- 
dure are clearly stated. Once the prior is defined, we can obtain the 
maximum a posteriori (MAP) estimator, by maximizing the poste- 
rior distribution, which is proportional to the likelihood multiplied 
by the prior. 



{Oh 



arg supg P(6 \ d). 



(15) 



Note that there is a crucial difference to the maximum likelihood 
estimator (eq.llOl) due to the incorporation of the prior information. 



(logP(0 1 d) cx log(P(d 1 0)P(0))) 

Q = logP(d| 0)+logP(0). 



(16) 



If we assume that the error is Gaussian distributed, (which is a fair 
assumption if there is no prior information about the noise), and we 
parameterize the prior of the parameter, say the signal s, we can 
rewrite eq. M6\ as (2Q Q) 

Q = -x'(s) + a/p(s), (17) 

where we absorbed the factor 2 in the Lagrangian multiplier a, 
and /p represents the penalty function that obliges the estimator to 
fulfill some constraint on the parameter s, to the detriment of the 
X^(s) that strongly relies on the data. If we further assume that 
N"^ = I (say we have white noise), the Mahalanobis distance re- 
duces to the Euclidean distance 

(OMah(rf,Rs)|jy-i^j = D|^^(d,Rs)), and the quantity one 
wants to minimize reads 

||ef+Q/p(s), (18) 

where we have absorbed the minus sign in a. Expression l |18l l 
is equivalent to least squares with a regularization term, and be- 
longs to Ridge-regression problems l lHoer]|l9^ . lHoerl & Kennard 
Il970l) . Assuming that the penalty function takes the following form 
/p(s) = we can write expression l llSt as 



(19) 



which then b ecomes the Tikhonov regularization method 
l lTikhonov|[l963h . The parameter a is called the regularization pa- 
rameter. These methods lea d to linear fi lters and are essentially 
identical to Wiener-filtering l|Fostejl96lh. which will be presented 
in the next section. Note that Tikhonov regularization is equiva- 
lent to MAP of a Gaussian likelihood with noise covariance matrix 
N = I and Gaussian prior, with signal covariance matrix S = oi~^\. 
Nevertheless, the penalty function /p in general can be a non-linear 
function of the parameter to be estimated (say the signal s) leading 
to non-linear estimators. We will introduce MEM as such an exam- 
ple. Tikhonov regularization can also be generalized to non-linear 
problems by introducing a non-linear kernel operator R(s). 

Summarizing the exposed theory of signal reconstruction, we 
might interpret the likelihood as some distance measure between 
the data and the noise-free model of the data, and the prior as some 
constraint that tightens the estimator to the model of the signal. We 
have shown here that the classical methods of signal reconstruc- 
tion, like the Tikhonov regularization, are particular cases of the 
Bayesian approach. The inclusion of a prior can be regarded as a 
natural regularization, in the sense that the regularization term is 
provided by a (physical) model of the true signal. In appendix|L] we 
discuss the relation between other regularization methods and the 
Bayesian approach. In the following sections we introduce differ- 
ent priors that are relevant for large-scale structure reconstruction 
and are implemented in ARGO. 



2.5.1 Bayes and regularization methods: the prior as a 
regularizer 

Looking at the log-probabilities, we see that the MAP esti- 
mator maximizes the following quantity using Bayes theorem 

^ We introduce here a generalized definition of the Mahalanobis distance 
as: D^^^ (x, y)j^ = (x — y)^M{x — y), with x and y being two vectors 
in the A''-dimensional space and M a x Af matrix. 



2.5.2 Gaussian prior 

The distribution of the primordial density field should be very close 
to Gaussianity according to most of the inflationa ry scenarios 
Il98ll ; lLindelll982l ; lAlbrecht & Steinhardt|[l983) . In fact, the mea- 
surements o f the CMB show very small deviations from Gaussian- 
ity (see e.g. iKomatsu et alj|2003h . Non-Gaussianities in the mat- 
ter distribution arose mainly from non-linear gravitational collapse. 
The non-linear regime of structure formation is responsible for the 
strong radial redshift-distortions, the finger-of-god effect, limiting 
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Non-informative priors 
Flat (ML) Entropic (MEM) 



Gaussian 



Informative priors (MAP) 

Poissonian 



Likelihood 



Gaussian 
-Radio 
-CMB 



COBE: Janssen & Gulkis (1992) 
Tegmark (1997) 
ROMA: Natoli et al. (2001) 
MAPCUMBA: Dore (2001) 
MAXIMA: Stompor et al. (2002) 
MAGIC* : Wandelt et al, ( 2004) 
MIRAGE: Yvon & Mavet (2005) 
MADAM: Keihanen et al. (2005) 



Sutton & Wandelt (2006) * 
Maisinger et al. (1 997) 
Hobsonetal. (1998) 



-LSS 



ARGO: MEMG* 
(section l2.5.9l and appendix |j) 



WIENER (Tikhonov, Ridge) 

Bunn & Sugivama (1995) 
Tegmark (1997) 



MAGIC*: Wandelt et al. (2004) 
P'Dwver et al. (2004)* 
Eriksen et al. (2007) * 
Larson et al. (2007) * 
Fisher et al (1994) 
_Hoffman(1994) 
Lah av et'al. (1994), Lahav (1994) 
Zaroubietal. (1995J 
Fisher etal. (1995) 
Webster etal. (1997) 
Zaroubi et al. (1999) 
Schmoldt et al. (1999) 
Erdogdu et al. (2004,2006) 
ARGO: WIENER"* 
(sections l2.5.3llT6l l4land appendix IB^ 



Hobson & McLachlan (2003) * 



Poissonian 



Richardson (1972) 
Lucy (1974) 



ARGO: MEMP* 
(section l2.5.9l and appendix|l) 



ARGO: GAPMAP* 
(section [2.5.4l and appendix IeI 



Inverse Gamma 
-CMB 



-LSS 



MAGIC* : Wandelt et al. (2004) 
O'Dwver et al. (2004) * 
Larson et al. (2007) * 
Eriksen et al. (2007) * 
ARGO** 
(section |2.6.2) 



Modified Gaussian 
-CMB 
-LSS 



Pierpaoli & Anthoine (2005) * 
Percival (2005) * 



'developed and presented in this paper; ** developed, tested and presented in this paper; *able to sample PDFs 



We have left out the reconstruction methods that are focused on the cosmological initial conditions, since they address a different problem and, in general, 
cannot be classified in terms of the PDFs listed in this table. Neither can other reconstruction algorithms based on geometrical arguments, 
like Voronoi, Delaunay tessellations, friends-of-friends schemes or cloud-in-cell interpolation schemes, be classified here. 
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the accuracy of reconstructions. Previous attempts to correct for 
these distortions have modifie d the power-spectrum by introducing 
a Lor entzian factor (see e.g. iBallinger et ^1 19961 : lirdo gdu et al.l 
|2004|) . In section 

we propose an alternative way to do this in 
a Bayesian framework, where peculiar velocities are sampled to- 
gether with the three dimensional map of the matter distribution. 
For the underlying DM density fluctuation we will assume a Gaus- 
sian prior. This is a crude approximation for the density field at 
the present epoch of the Universe, especially on small-scales. It is, 
however, a valid description on large-scales and allows to incorpo- 
rate non-linear corrections in a MCMC fashion, as will be discussed 
in section l |2.6l ). Following Bardeen et al.iCl986) we may thus write 
the PDF of the signal as a multivariate Gaussian distribution 

I ^) = [(2.)Met(S)]V. ^-P (4^'^"^) ' ^20) 

with S being the covariance matrix of the signal 
(S = S(p) = (ss^)(s|p))- This formula emphasizes the high 
dimensional character of the problem (n dimensions of the signal 
reconstruction, with n being typically between 10^ and 10^). 

2.5.3 Gaussian prior and Gaussian likelihood: the Wiener-filter 

The Gaussian prior together with the Gaussian likelihood lead to 
the Wiener-filter, completing the square for the signal in the ex- 
ponent of the posterior distribution (see .Zaroubi et al. (.1995h and 
appendix IaI. 

P{s I d,p) 

oc exp ^-i pS"^s + {d- Rs)'''N"^(d - Rs)]^ 

cx exp ^-i \{s - (s)wF)^(fTwF^)~^(s - (s>WF)j^ ,(21) 

where the Wiener-filter used to calculate the estimator from the data 
(s)wF = FwFci is given by 

FwF = (S^^ + R'''N"^R)"^R'''N"\ (22) 

and the corresponding covariance is 

(t|vf = {T-r-^>wF = (S"' + R^N^'R)"', (23) 

with r = s — (s)wF being the residual. A similar filter to the 
Wiener-filter can be obtained by the LSQ estimation [j] (for an ex- 
plicit derivation see Zaroubi e t aL.1995i . and appendix |B]l leading 
to the following expression 

(s>LSQ = (sdt)(dd^>-'d, (24) 

where the correlation matrix of the signal and the data ({set')) is 
multiplied by the inverse of the autocorrelation matrix of the data 
{{dd^)~^). Given that the signal and the noise are uncorrelated 
((se^) = 0), the correlation matrix of the signal and the data re- 
duces to: (sd^) = SR^. Thus, the filter in eq. J24b can be reformu- 
lated as 

Flsq = SRt (RSRt + (N) (s IP) ) - ' . (25) 

Note that in this case, the least squares are referred to the residuals 
r, i.e. the difference between the real signal s and the estimated signal 
(s)lsq: 1 1^1 P = I |s — (s)lsq I p. where the prior on s is given in a more 
impHcit way by assuming a linear relation between the estimator and the 
data and statistical homogeneity. 



The noise covariance matrix for the LSQ estimator will differ from 
the one in the likelihood, if there is a signal dependence in the struc- 
ture function of the noise term as it is the case for a Poissonian-like 
distribution. 

From the structure of the LSQ filter Flsq (eq.|25b. one could 
postulate another expression for the Wiener-filter given by: 

FwF = SR^ [RSR^ +Ny\ (26) 

We show in appendix l[Cj that both expressions for the Wiener-filter 
(eqs.l22land|26t are equivalent. From now on, we will call eq. l |26l l 
the data-space representation of the Wiener-filter, and eq. i22\ the 
signal-space representation of the Wiener-filter. Note, that the LSQ 
estimator will coincide with the Wiener-filter after performing an 
ensemble average over all possible signal realizations: (s)lsq ~ 
{{s)wf)(s\p)- 

The following notation can be introduced for the posterior 

PDF 

P{s \ d,p) QC G{s ~ {s)wF,crwF), (27) 

i.e. given a dataset d derived from a Gaussian process, the possi- 
ble signals are Gaussian distributed around the Wiener-filter recon- 
struction (s)wF with a covariance ct^f- The parameters p enter 
the operator R, including also the cosmological parameters that de- 
termine the signal covariance matrix S. We will discuss in section 
l |2.6t how to sample S and to determine cosmological parameters. 

A remarkable characteristic of the Wiener-filter is that it sup- 
presses the signal in the presence of a high noise level resulting in 
the null estimator and gives just the deblurred data when noise is 
negligible. In this sense it is a biased estimator, since its covariance 
matrix has less power than the original one. Some attem pts have 
been made to derive an equivalent unbiased estimator (see'Zaroubil 
2002). However, one might be especially interested in obtaining a 
conservative est imator. Sampling the joint PDF will fill the missing 
modes (see e.g. IWandelt et al.ll2004h and in this way complete the 
signal in regions where it is under-sampled or the signal to noise 
ratio is low. It is interesting to note that the Wiener Filter coincides 
with the MAP estimator in the case of a Gaussian prior on s and a 
Gaussian likelihood ((s)wf = (s)map). Performing the integral 
of the conditional PDF (see eq. |6) one obtains the same estimator 
again, thus (s)wf ~ {^}(s\dp)' This is a very important result, 
since it permits one to sample the conditional PDF. We propose to 
exploit this property for the joint estimatio n of the signal and its 
power- spectrum as is done in the CMB (see lWandelt et alj ( l2004h 
and section l2.6.2t . 



2.5.4 Gaussian prior and Poissonian likelihood: the GAPMAP 
estimator 

The Gaussian likelihood constitutes a valid approximation when 
the Poissonian character of the distribution is appropriately mod- 
eled in the noise correlation matrix N. However, one would rather 
describe a discrete sampling process like a galaxy survey with a 
Poissonian likelihood. Unfortunately, there is no filter available for 
such a case. Thus, we present a novel iterative equation for the 
MAP estimator with a Gaussian prior and a Poissonian likelihood, 
which we call GAPMAP (see appendix |E]for a derivation) 

s^ + ^ = SR'' brq ^-1 + diag (Rn^{l+bs^) + cj ^ d'^ . 

(28) 
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2.5.5 Flat prior 

With the aim of deriving objective posterior distributions, non- 
informative prior distributions are introduced. A non-informative 
prior would suggest that any value is reasonable. Flat priors where 
the probability distribution is assumed to be constant P{s) = 
const are thus very often applied. Note, however, that these are 
improper priors, since the integral of these distributions diverges to 
infinity. In this case, the posterior is proportional to the likelihood. 
The maximum likelihood solution coincides in this way with the 
MAP estimator assuming a flat prior ({s)ml = (s)MAplflat)- 



2.5.6 Flat prior and Gaussian likelihood: the COBE-filter 

In CMB map-ma king algorithms it is common to use the s o-called 
COBE-filter (see Ijanssen & Gulkis|[l993 : lTegmarklll997l) . which 
can easily be derived by maximizing the likelihood given in eq. il2\ 



FcoBE = (R+N-'R)-'RtN~' 



(29) 



This filter has the property that among all unbiased linear estima- 
tors (with a noise o f zero mean), it leads to the minimum variance 
jNatoli et alj|200ll) . Here unbiased means that the statistical mean 
of the estimator is equal to the true signal. This is, however, only 
fulfilled when the inverse of R^N~^R exists (see appendix|G]l. The 
covariance for the COBE-filter can found to be 



o-coBE = (r-rt)coBE = (R+N-'R) 



(30) 



Note that, in general, the following relation holds: (T^p fcoBE> 
as a comparison to eq. J23t shows. 

pTegmarkl il997i) claims that several linear filters like the 
COBE or the Wiener-filter conserve information by comparing the 
Fisher information matrix corresponding to the filtered signal to the 
one of the un-filtered time ordered data. This property apparently 
permits one to perform cosmological parameter estimation from 
the reconstructed signal after filtering the data. However, linear fil- 
ters conserve information only if they are invertible, which is not 
provided for realistic cases as we show in appendix |Hl A consis- 
tent estimation of cosmological parameters has to be done in a full 
Bayesian framework by estimating the joint PDF of the signal an d 
the parameters, as we will see in section i2.6\ jWandelt et al.l2004h . 



2.5. 7 Flat prior and Poissonian likelihood: the Richardson-Lucy 
algorithm 

A widely used deblurring algorithm in astrono my and medical to- 
mography is the Richardson-Lucy algorithm jRichardsonl 1 19721 : 
ILucvII1974 !). which was shown to be the maximum likelihood so- 
lution with a Poissonian likelihood bv lShepD&\^ h982l) . We 
show the derivation in appendixlF] as a simplified case with respect 
to eq. {2i\ . The Richardson-Lucy algorithm cannot prev ent seri- 
ous n oise amplifications in the restoration process (see e.g. lCarassol 
I1999I) . This is a natural consequence when a prior that regularizes 
the solution is missing. A toy application is presented in section 



2.5.8 Jejfrey's prior 

Other non-informative priors have been suggested based on in- 
variant statistical structures under transformation of variables in a 
Bayesian formalism. Considering a one-to-one transformation in 



the one-dimensional case of the parameter: (p — f{9), the equiva- 
lence between the respective prior densities is expressed by 



p(<^) = p{e) 



de 



p{e)\f'{e)\ 



(31) 



This relation is satisfied by Jeffrey's prior P(^) oc [J(^)]^''^, where 
J{6) is the Fisher informatior[f| 



91ogP(d|( 



961 



92 logP(d|( 



)(d\0) - ' ' hd\0), 

(32) 

and where we have assumed the following regularity condition 

/ dd^P{d I 6*) = 0. Relation ([SD can be proved easily by do- 

/ a^ iogp(d|0) \ 
"\ fl^P /( 



ing the evaluation J{(j>) 



){d\e) 



j{e) 



d£ 

I d0 



(see e.g. lOelman et al.ll2004h . Note, however, that in the multidi- 
mensional case, Jef frey's prior may lead to incoherences o r even 
paradoxes (see e.g. Berger & Bernardo! IT993 : iRobertI 12001 ). Jef- 
frey's prior is applied adequately, when not even the order of 
magnitude of the parameter to be estimated is known a priori. 
We derive Jeffrey's ignorance prior for the 3-D power- spectrum 
(S = diag(Ps(fe))]j in appendix |I] (see section 12.6.21 for an ap- 
plication of this prior). 

2.5. 9 Entropic prior and Maximum Entropy method 

Another approach searches the least informative model compatible 
with the data using a prior based on Boltzmann's definition of en- 
tropy (or equivalently, Shannon's notion of information, see 
iShannoa.l948i) . 



P(s I p) — exp(a5' 



(33) 



and maximizing the resulting posterior distribution, being a some 
constant, and s the so-called hidden image (or signal). This infer- 



(Javneslll96l 


1968 


; FriedeiJ 19721: IGuU & Daniell 19781; Gulll 


119891; ISkillind 


1989 


: 'Maisinger et al."l997': 'Hobson et aljfl998b. 



For a review see Naravan & Nitvananda ( 1986). From now on we 
will represent the underlying signal by s in the framework of MEM. 
The MEM can be considered as MAP estimation with an entropic 
prior 

The particular expression for the entropy depends on the sta- 
tistical formulation of the non-informative prior. Let us think of a 
positive signal as a grid with q cells, with each cell i having a cer- 
tain intensity value Si, i = 1, . . . ,q, with an uncertainty on each 
value given by ±q~^. Then we define some discrete quanta rii on 
each cell related to the intensity through the uncertainty: rii — asi. 
The signal can be guessed by distributing the rii quantas in the 
grid. In this way, the image is modeled in this way analogously 
to the energy configuration space of a thermo-dynamical system. 
If we further demand each cell to be iid, the number of ways this 
object can occur is given by the multiplicity 



W - 



(34) 



* The generalization to the multidimensional case leads to the following 
matrix form: Jij(0) = (^ S''-°sP^d\S) aiog^-P(d|0) ^^^^^^ (see appendix 

E). 

^ Here the autocorrelation matrix S is represented in k-space. We will dis- 
cuss this in further detail in section (33). 

Not to be confused by the signal autocorrelation S. 
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with Nq being the total amount of quantas to be distributed in all 
cells (A^q = '^^Ui). The probability of any particular result is then 
given by the multinomial distribution 

P(s' I p) = PFg"^". (35) 

ISutton & Wandeld booeh propose to sample from the multiplicity 
function directly to perform reconstructions in radioastronomy. By 
using Stirling's formula for the factorials (n! ~ n"e~") we can 
write 

log P{s' I p) — —a s'i log s'i + const. (36) 

i 

Comparing this expression with eq. J33l >, we recover Shannon's 
definition of entropy (S+ = log s'^ j^- The expression that 

is commonly used for the entropy is a generalization of Shannon's 
formula by Skilling that can be derived based only on consistency 
arguments within probabilisti c information t heory for positive and 
additive distributions (PADs) jSkillindl 19891) . 

This generalization implies the definition of a Lebesgue mea- 
sure (m) for the integral of some function of the hidden image to 
represent the entropy 

S+{s' I p) = ^ \s'i - mi - s'i log [s'i/mi) j , (37) 

i 

here in its discretized form. Skilling's expression for the en- 
tropy can also be derived by considering a team of monkeys 
throwing balls at q cells at random with Poissonian expectation 
Mi: P(n\tJL) = Hi Mr*s ''V'^i-' where Ui — asi and /i — ami 
(ISkilling'igs"?). For a re view on further expressions for the entropy 
see^olina et al] j200lh . 

The global maximum of over s in the absence of further 
constraints is found to be s' — m. Consequently, m can also be 
thought of as a prior model for the image. However, this expres- 
sion for the en tropy will allow reconstructing positive signals only. 
IZaroubi et al] ll995) propose to define s' — p and m — p^, to 
avoid the possibilit y of having a negative d istribution for s. 

According to lOull & SkillingI ( [l99(i) the MEM can be ex- 
tended to reconstruct distributions, which can be either positive or 
negative, as in the case of density fluctuations. Such distributions 
can be described as the difference between two subsidiary positive 
distributions (PADs) 

s — u — V, (38) 
relative to a common model m 

S±{u, V \ p) = ^2 ~ - Ui log(iii/mi)j 

i 

+ ~ ^"^^ ~ ^og{vi/mi)^ . (39) 

i 

One can see from eq. l[38) that dS±/du = —dS±/dv, hence 
yielding 

uv = . (40) 
From the relations given by eqs. ([38} and ( I40t . it is easy to derive 

it = i(«; + s), (41) 

The "+" symbol in denotes that the definition is only valid for posi- 
tive signals s' . 

The "±" symbol in denotes that the definition is valid for positive 
and negative signals s. 



v^^{w-s), (42) 

with Wi = [si + AmlY^"^ . Using these expressions, the total en- 
tropy can be rewritten as 

I p) = ^ \wi - 2mi — Si log (^{wi + Si)/2mi^ j . (43) 

i 

The Maximum Entropy method gives a non-linear estimator of 
the underlying signal that one wants to reconstruct. This method 
is especially interesting to study devia tions from Gaussianity 
jMaisinger et alJl99llHobson et aljl99^ . It is equivalent to max- 
imize with a Lagrangian multiplier, which includes a penalty 
function given by the entropy. Maximum Entropy in this context 
searches the hidden image that adds the least additional informa- 
tion to the data. 

The quantity we need to maximize is given by 

Q"" {s \ p)^aS''{s\p)+ log £.{s I d,p), (44) 

where the log £, is given by eq. l ll3t or eq. (Ell . The equation we 
want to solve is 

VQ''(s I p) = 0. (45) 

In section l |3.2t , different iterative algorithms to solve this non- 
linear problem will be discussed. The required expressions for the 
gradient of and its curvature for positive and positive/negative 
expressions of the entropy (eqs.|37|and |43ll and for both Gaussian 
and Poissonian likelihoods are presented in appendix|j] 

Note that in the limit of low density fluctuations, i.e. in 
the linear regime, the expression of the entropy reduces to the 
quadratic entropy (eventually with an offset of the origin of s), 
S^{s I p) ~ — Si/2mi. This expression is very similar to a 
Gaussian prior for the signal with a variance given by m. In that 
case Maximum Entropy leads to the Wiener-filter. 

2.6 Markov Chain Monte Carlo: sampling the joint PDF 

The drawback of the maximization methods hitherto mentioned, 
is that they find a unique estimator that is most probably subject 
to the chosen values for the required parameters. As already men- 
tioned, the complete characterization of a system is contained in 
the joint PDF in the product space of possible signals and pa- 
rameters. Thus, it would be desirable to sample from this PDF 
to find the region of highest confidence for our estimator. This 
is possible using Markov Chain Monte Carlo (MCMC). The im- 
portance of sampling from the joint PDF and the viability of do- 
ing that with MC MCs has already been discussed in other con- 



texts in astronomy ( 


Hobson & McLachlanl2003l;|jewell et al.l2004l; 


IWandelt et al.ll2004 


). With the MCMC method, the whole system 



can be moved in its configuration space by updating all variables 
successively in a Monte Carlo fashion, until the system relaxes 
(burns-in) and reaches the highest density region. 

The expectation of the i-th parameter (6i) can be calculated 
by the so-called ergodic average, which is given by the mean of the 
sample 

^ K ^ ^^^^ 

with A^b being the size of the sample drawn once the Markov Chain 
has burned-in. In general, the mean estimator is more reliable than 
the maximum of the distributio n, especially in case s with devia- 
tions from Gaussianity (see e.g. iGelman et al]|2004h . The MCMC 
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method permits one to approximately solve the integral in eq. ([4} 
through expression l l46t . 



2.6.1 Gibbs sampling 

The most straightforwar d MCMC method is the Gibbs sampler 
dOeman & Gemanlll984l) . also known as the heatbath algorithm. 
The Gibbs algorithm samples from the joint PDF by repeatedly re- 
placing each component with a value drawn from its distribution 
conditional on the current values of all other components. This pro- 
cess can be seen as a Markov Chain with transition probabilities TTfc 
for fc = 1, n, 

TVk{e,e') = p(e'k \ {e, -.i^k}) ■Y[s^{o^,e'^), (47) 

where {6i : i ^ k} = (6x, ...,6k-i,9k+i., ■■■,9^) (see e.g. |Neal 
Il993h and Jk is the Kroenecker delta-function. The Gibbs sampler 
starts with some initial values 0^"' = (Sj"', Si"'') and obtains 



new updates 6 



6n^) from the previous step 6 



(j-i) 



through successive generation of values 



qU) 



g(i) 



^'(^1 1 {e. 



(j-i) 



/I}) 



gO-l) 



>2}) 



P(0„ I : i / n}) 



(48) 



In this way a random walk on the vector 6 is performed by mak- 
ing subsequent steps in low-dimensional subspaces, which span the 
full product space. This is similar to individual collisions of parti- 
cles in a mechanical system that drives a many-body system to an 
equilibrium distribution for all degrees of freedom. We are espe- 
cially interested in this sampling method because of its efficiency 
that permits us to tackle large dimensional problems in contrast 
to o ther algorithms , whic h include acceptance and rejection rules. 
See IWandelt et al.l ( l2004h for applications in CMB-mapping and 
power-spectrum estimation. However, in the case where the partic- 
ular distribution function is unknown or cannot be explicitely ex- 
pressed rejection sampling methods will be neces sary (see section 



2.6.3b, like the Met ropolis-Hastings algorithm jMetropolis et al.l 
1953| - |Hastingll97(]|) . 



The MCMC method can be applied to perform simultaneously 
the reconstruction of the density field and the estimation of other 
parameters, such as the power-spectrum, the peculiar velocities, 
the bias, or the comological parameters (see fig.[D- We propose in 
the next sections two novel applications of this method to power- 
spectrum estimation from a galaxy redshift survey and redshift- 
distortion corrections, which can also be used in a joint algorithm. 
Note, that a higher degree of complexity can be achieved in the 
schemes we present here by going beyond linear perturbation the- 
ory or considering higher moments of the density field. 



functions (see e.g.'Eisenstein &Hul999^. Then the following sam- 
pling processes are iterated until the chain burns-in 



P(s I S<^',d) 



P(S 1 s 



(49) 



(50) 



The DM signal is sampled with the following PDF (see section 
[23:2l > 



P{s I S^'\d) cxG(s-FwF(S"Od,o-wp(S 



(51) 



The Wiener reconstruction is known to give a biased estimator, 
which attenuates the power especially for the modes where the 
noise becomes important, as discussed in section ( |2.5.3t . This filter- 
ing effect has to be compensated by adding a fluctuating term with 
statistics according to the correct covariance (see iWandelt et al] 
l2004h 



.0) 



(52) 



To generate the data augme ntation V„J,,^ one has to solve the fol 
lowing set of equations (see lEriksen et aflboOTl) 



(53) 



where xqi and xq^ are two independent Gaussian variates. One 
can show by direct calculation that y^-'^y^ has a c ovariance given 
by CTwF- To stabilize the inversion lEriksen et al.l ( |2007|) suggest 
using the following expression derived from the previous one by 
factorizing the square-root of the power-spectrum 



(54) 



Accordingly, the reconstruction step can be done by solving the 
following set of equations based on the signal-space representation 
of the Wiener-filter (eq|22t 



*WF ^ 



(S' 



sO))i/2 + (s(^))i/2RtN-iR(S(^')^''^) 
0)^l/2RtN-ld. 



(55) 



This allows to perform the inversion for the fluctuating term and 
for the reconstruction in one step. An alternative way, permits us to 
use the data-space representation of the Wiener-filte i^i bv generat- 
ing the fluctuations with a constrained realization (see' SertschingeJ 
,1987. : .Hoffman & Ribak 1991 ; Ganon & Hoffm an. 19931) 



y(J) = g 



— FwF" , 



(56) 



using two auxiliary Gaussian random fields s and e with zero mean 
and correlation (SS^) — S and (ee^) — N respectively. Further 
we set d — Rs + e. This method has the advantage that non- 
linear reconstructions can be obtained with N-body simulation^^ 



2.6.2 Joint signal and power-spectrum estimation: sampling the 
cosmic variance with data augmentation 

The joint PDF considered here is given by the joint PDF of the 
signal and the power- spectrum P{s, S\d). For the initial guess ei- 
ther an expression for the power-spectrum can be applied (see e.g . 
Efstathiou et al. 1992: Pe acock & Dodds|[T994l : ISmith et al.|[l99^ : 
Eisenstei n & Hu 1999), or the power-spectrum of the CMB can be 
taken and calculated for the required redshifts with some transfer 



Note, that the data-space representation for the covaiiance (eq. IC2) is 
not appropriate, due to the inverse of the response operator (see appendix 
0. 

Note, however, that a Gaussian constrained realization is good enough 
for power-spectrum estimation especially when one is interested in the 
traces of the lineal' regime, like the baryon acoustic oscillations, or the grav- 
itational potential for the ISW-effect. Sampling with constrained N-body 
reconstructions requires a much deeper development, since the whole cos- 
mological parameter space has to be scanned. 
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(see Isistolas & Hoffman|[T998h . It can be shown that the term in 
eq. J56b has the appropriate Wiener covariance (see appendix IdI ). 
Each reconstruction step can then be done in one step with the di- 
rect Wiener representation by solving the following equations 



(57) 



The power-spectrum can be sampled by an inverse gamma 
function, whi ch we derive here for the case of the 3D power- 
spectrum (see lWandelt et al.ll2004l . for the analogous CMB case) 



P(S I s) oc P(S)P(s 1 S). 
Assuming a Gaussian signal s (see ea.l20t this yields 



(58) 



P{Ps{k) I s 



exp 



2Ps(fe) 



(59) 

with S — diag(Ps(fc))- The prior P{Ps{k)) can be chosen to be 
flat (P(Ps(fc)) = const) or instead Jeffrey's prior can be used 
(P(Ps(fc)) oc Ps(fe)~^), see section ( 12.5.8b and appendixH] Note, 
that the likelihood for the power-spectrum given by eq. l l59t is 
clearly non-Gaussian. 



2.6.3 Joint signal and peculiar velocities estimation: 
redshift-distortions correction 

We propose to sample the peculiar velocities in a MCMC fash- 
ion (see se ction 12^, analogous to the case of the power-spectrum 
(see lWandelt et ah t2004) and section l2.6.2l (. We draw realizations 
of the matter field given the data, a power- spectrum and assumed 
galaxy peculiar velocities 



The velocities are subsequently sampled too: 



P(i 



(60) 



(61) 



In each step where we sample the peculiar velocity, the redshift- 
distortion can be corrected using 

^(j+i) ^ ^ _ ^(j+i) (62) 

We propose to sample the peculiar velocities from a PDF with a 
mean {i')m given by the linear theory vur and a velocity dispersion 
a-u depending on the local value of the over-density. 



P{v I s'^') o,g{v- (i;>M(s(^'),a^(s(^))) 



(63) 



where we have taken a Gaussian distribution, but this could be ex- 
tended to other PDFs. 



3 NUMERICAL METHOD 

In order to efficiently sample the joint PDF, as it is required in 
MCMC methods (see section IZ6t . fast inverse algorithms need to 
be considered to regularize the solution. General iterative inverse 
methods scale as 0{n'^) since they imply matrix multiplications of 
a n X 71 matrix in an iterative fashion (at most ri-steps until con- 
vergence). This makes the study of the joint PDFs as presented 
in section ( 12. 6t , at a first glance, un-feasible. However, a proper 
formulation of the problem in an operator formalism allows treat- 
ing the matrices as operators that have to be neither calculated nor 
stored. Within this operator formalism, the inversion methods we 
present here sped up to a scaling of 0{n logj n). We start with a 



general formulation of iterative methods and subsequently present 
the different schemes that we have implemented in ARGO. Since 
a preconditioning treatment can dramatically enhance the perfor- 
mance of iterative schemes (see our numerical experiments in sec- 
tion Q, we pay special attention to this point in the derivation of 
the different schemes. 



3.1 Iterative inverse and regularization methods: a unified 
formulation of different linear methods 

Let us consider a region D in the n-dimensional Euclidean space 
En and denote L2 (D) the Hilbert space of all complex measurable 
square integrable functions J„ d"z|gp(z) < 00 with inner prod- 
uct0 



{g\s) = / d"zg{z)s{z), 

J D 

and norm of g G L2 (-D) 

llsll = {9\9)'^'- 



(64) 



(65) 



Let ^ be a subspace of the Hilbert space L2 (D) with the conditions 
that every element ij} £ "i/ must satisfy being smoothness, limit be- 
havior at the boundary D, etc. Let us now consider the linear op- 
erator A, defined on the linear manifold ^, and suppose that A is 
a positive definite, i.e. {Atplip) ^00 for aU xp G The kind 
of inverse problem we are interested in belongs to the stationary 
problems of the form 



(66) 



since, for example, for the COBE-filter we have to invert 
A(s>coBE =R^N-M, with 

ip = {s>coBE' A = R'l^N-^R and / = R^N^^d, and for the 
Wiener-filtering we have 

ip = (SRt)-i(s)^p, A = (R+SR + N) and f = d. Eq. ^ has 
the same structure as eq. (|2}, but without a noise term. Hence, a 
regularization method is again required. 



3.1.1 Minimization of the quadratic form 

Another way of approaching the linear inverse problem is the min- 
imization of a quadratic form given by 



Q^W = -{A^\rP)-{f\-,p)+c. 



The gradient of leads to 



dip 



(67) 



(68) 



assuming that the operator A is self-adjoint. Setting the gradient 
to zero, one obtains eq. ( 1661 ). The surface defined by a quadratic 
form with a p ositive definite ma trix A is shaped like a paraboloid 
bowl (see e.g. IShewchuk|[T99i) . This ensures the existence of a 
unique minimum or, equivalently, the convergence of appropriate 
algorithms. 



Here a Dirac type notation is introduced. It should not be confused with 
the ensemble average notation, which does not have a balk in-between. 

This expression can be written in matrix notation as i/i^ Ai/i ^ 0, where 
i/it is the conjugate and transpose of the vector 
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3.1.2 Solution of the non-stationary problem: asymptotic 
regularization 

Here, a unified framework for the regularization methods that we 
have implemented in ARGO is given based on the asymptotic reg- 
ularization. Nevertheless, an original Bayesian motivation to the 
asymptotic solution is presented in appendix iLl 

The stationary problem (eq. I66t can be replaced by a non- 
stationary equation, which relaxes to the equilibrium solution 



dt 



We seek solutions of the form 

I 

with a spectrum for the operator A 

All; = XiUi. 

Expanding / in this basis, yields 



(69) 



(70) 



(71) 



(72) 



Then we get the following relations for the Fourier coefficients in 
the stationary case 



>^itpt = fi, 
and for the non-stationary case 



dt 



which lead to the following solutions 



and 



(73) 



(74) 



(75) 



(76) 



for the stationary and non-stationary cases, respectively. Since the 
spectrum of a positive definite operator A is real. A; > 0, it follows 

that limt^oo ip] non — stationary — stationary • 

The non-stationary problem can be solved using difference 
methods with respect to t 

^3+1 ^ ^3 r^M^if - Axp^), (77) 

with {M-*} being a set of non-singular matrices Q and {tj} being 
a sequence of real parameters. Here we concentrate on a constant, 
self-adjoint matrix M. Let us rewrite eq. ( 177 1 as 



(78) 
(79) 
(80) 



with the residuals given by 

The error vectors are defined as 

where ip* = A^^/is the exact solution. The matrix M and the real 
number {tj} are chosen to speed up the convergence. M usually 

1^ We implicitly generalized eq. |69) to dip{t)/dt = M{t){f - Atp), 
where the auxiliary matrix M is chosen to speed up convergence. 



represents the preconditioning of eq. illl and tj can be interpreted 
as the time step (see appendix iMt. and is also called relaxation pa- 
rameter. Here truncation regularization occurs by quitting the iter- 
ation loop. Some stopping rules are therefore required. In the case 
where no noise regularization was conducted in the first step, they 
crucially define the noise regularization. In the other cases, they 
mostly determine algorithmic performance and accuracy. At this 
point we are interested in the regularization for the inverse purpose, 
since we have already found expressions which regularize the noise 
(e.g. Wiener-filter, or MEM). However, the results presented in sec- 
tion|4]show that in some cases truncation leads to better results (see 
discussion in section |43t . In the following sections, we will show 
how different iterative schemes are based on the general formula 
given by eq. ( 177b . It is worth mentioning that other methods that 
we do not discuss i n this paper, li ke the algebraic reconstrcution 
technique (ART, see lGordorj|l974) . can also be expressed through 
this formula. 



3.1.3 Jacobi method 

The Jacobi iteration method splits the operator A in two matrices 

A = D + B, (81) 

where D contains the diagonal elements of A and B contains the 
off-diagonal elements. From eq. J66b one follows 

V=D-H/-BV). (82) 

Substituting B by A — D one gets the following iteration scheme 

^rP^ +D'\f -ArP'). (83) 

The Jacobi method turns out to be a particular case of the itera- 
tion scheme given by eq. J77t with a preconditioning matrix given 
by M = D^^ and r-* = 1. This method can, must be optimized by 
increasing the timestep r-* by a certain percentage if the solution 
converges and decreasing the timestep if the solution diverges. An 
optimal timestep is hard to find, because the spectrum of the oper- 
ator A has to be known (see appendix iMt. 

3. 1.4 Steepest Descent method 

The steepest descent method searches the minimum of the 
quadratic form by choosing the direction in which Qj^ decreases 
most rapidly. This direction is given by the residual 

-Q'j^i^')^ f -ArP' (84) 

The form of the iteration scheme is thus given by eq. J78l >. with 
the length of the step in the direction of the residual given by r-* . 
Steepest descent looks for the optimal length which minimizes the 
quadratic form with respect to r-* 



: 



(Qa(^' 



.dip 



(85) 

This implies that subsequent searching directions must be orthog- 
onal (say M — I). Starting from this condition it is straightforward 
to derive the expression for r-* . It is only necessary to use the defi- 
nition of residual for and substitute from eq. ( |78l l. 



(AM^^lM^^y 



(86) 



Both the calculation of the factors t-' and the residuals imply 
applying the operator A, each time on different vectors. It is pos- 
sible, however, to reduce the operation of A to the same vector for 
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every iteration, but the residuals, must be calculated in a different 
way. Multiplying both sides of eq. l l78b by —A and adding /, one 
obtains the following relation for the residuals 

e^'- ■ ■ ■ 



(87) 



Notice that the vector AM^^ already appears in the expression 
for r-*, and consequently saves one operation. However, expres- 
sion l l79b has to be periodically used with the feedback of i/)-', to 
avoid the accumulation of floating-point roundoff error. The disad- 
vantage of this method is that it ends up searching repeatedly in the 
same direction. This is especially severe when the quadratic form 
is highly deformed, which occurs when the matrix A deviates from 
the unity matrix. We will see, however, that steepest descent com- 
petes with any other method when the preconditioning is effective, 
and thus the stretched shape of the quadratic form is brought close 
to a spherical symmetric shape. Preconditioning should not imply 
too many operations; that is the reason why the inverse of the ma- 
trix, which contains only the diagonal elements of A, is usually 
taken for preconditioning. This will work especially fine when the 
operator A is diagonally dominant, which in our case occurs when 
nearly full-sky data are available. 

3.1.5 Krylov methods: Conjugate Gradients 

To make the iteration scheme more efficient. Conjugate Gradi- 
ents proposes to search each time in a different direction. This 
is achieved by imposing A-orthogonality to two different {i 7^ j) 
searching vectors and (i-' 

{m^|m'^)a = (Am^|/x')=0, (88) 

which are then said to be conjugated. In the preconditioned case, 
the searching vectors are multiplied by M so that the conjugacy has 
to be formulated in the following way: 
(M/x^' |M/x'>4 = (for i / j). 

The iteration scheme is given by substituting the residuals in 
eq. l l78t by the new searching vectors {/x-' } 

(89) 



By subtracting i/!* we obtain an equation for the errors, 

?7^+^ = 77-' + r^M/x^ (90) 
Taking into account the relation between the residuals and the er- 



= — Aj7 



(91) 



we can derive the recurrent formula for the residuals 

^^■+^ = -A(J7^' + r^M/x-') = - r-' AM/i^ (92) 

Here again, expression l |79l l has to be used periodically with the 
feedback of ■0-' to avoid the accumulation of floating-point round- 
off error. The optimal length of the step is found by minimizing the 
quadratic form 



Substituting expression J90l > in l |93l ) we then obtain 



(93) 



(94) 



It can be shown that this formula is equivalent to the following 
expression 



{M/x^-|Mm^-)a' 



(95) 



using (i-' lM/x^ ) = (I^ IM^J } (see appendix[K). 

To generate A-orthogonal searching vectors one could think 
of Gram-Schmidt-conjugation 



,jk k 

fj, . 



(96) 



Here it was assumed that the residuals {^-' } form a set of linearly 
independent vectors (see appendix 10. The expression for the fac- 
tors (9-''° can be derived by calling A-orthogonality in eq. l |96t 



= (MC^|M/x'>a + /3''(M/x'|M/x')a. (97) 
One obtains the following formula for the factors 



(M^^|M/x')a 



(98) 



where i < j according to eq. ( 1961 

This method seems to require too much memory, as apparently 
all previous searching vectors must be stored to calculate the new 
one. However, only one /3-factor remains in the sum in eq. l |96K as 
we show in appendix IK3 1 Hence, Gram-Schmidt orthogonalization 
can be simplified to the following expression 



where 



ii+i 



(99) 



(100) 



with EXP meaning expensive, since the nominator of 13 apparently 
requires an extra A operation. This additional operation can be 
saved by taking the vector AM/x-' from r-* or with alternative meth- 
ods (see table[2|and appe ndixiKt. like the Fletcher-Reeves method 
dFletcher & Reeveslll964h 



{em') ' 

the Polak-Ribiere formula jPolak & Ribierdl 19691) 



(101) 



or the Hestenes-Stiefel expression ( lHestenes&Stiefelll952h 



Pus — 



(102) 



(103) 



However, /3exp turns out to be a very efficient scheme, which be- 
haves far more stably than the rest (see section Q. Since the (3- 
formulae (eq. I100I103I ) are mathematically equivalent, one could 
think of combining them in a single scheme finding numerically 
different solutions. However, this kind of hybrid scheme remains 
to be thouroughly studied. 

Formula J99t shows that new searching vectors are built from 



Note that the sign of /3 depends on the definition of the Gram-Schmidt 
conjugation. An alternative definition with the negation of the residuals 
would cancel the minus sign in eq. j98). The sign of /3 can be regarded 
as a free parameter. 
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FR 


PR 


N3/D1 


— 




N1/D2 


N2/D2 


N3/D2 


— 




N1/D3 


N2/D3 


N3/D3 






N1/D4 


HS 


N3/D4 






N1/D5 


N2/D5 


N3/D5 




(M/i-'|M/xJ)^ 








EXP 



Table 2. Formulae for the /3-factor: /3^^^ = ■ Three of the methods are discussed in the literature: FR (Fletcher-Reeves), PR (Polak-Ribiere. and HS 
(Hestenes-Stiefels). The rest of the formulae are derived in this paper using equivalence relations derived in appendices IK 1 IK3l The FR and the PR methods 
are tested against the EXP algorithm in section @. 



a linear combination of the current residual and the previous search- 
ing vector. Since the subsequent residuals are given by the lin- 
ear combination of the previous residual and the A-operator ap- 
plied to the searching vector, the manifold where the solution is be- 
ing searched is spanned by the residuals and the so-called Krylov 
space. The latter is built by applying the A operator to the basis 
vector successively. In this manifold, curved quadratic forms ap- 
pear to be spherical and thus the searching process becomes more 
effective. It is possible to derive the Conjugate Gradients method by 
minim izing the A-norm of the error: minj|r7j|^ (see e.g. lMarchukI 
Il982l) . In this sense an optimal solution to the inverse problem can 
be found even if no unique solution exists. Conjugate Gradients 
works, even if the operator A is not a positive definite (for a dis- 
cussion see e.g. Shewchuk 1994). It can easily be shown that Con- 
jugate Gradients converges at most in n -steps, with n be ing the 
number of pixels/vector columns (see e.g. lShewchuklll 994h . 

3.2 Non-linear inverse methods 

Non-linear inverse methods are especially required in reconstruc- 
tion algorithms that do not assume a Gaussian distribution. The it- 
erative method given in eq. l|28), which makes use of a Poissonian 
likelihood, can alternatively be solved with the methods presented 
in this section. The same applies to the MEM, where zeros of the 
non-linear eq. l |45l l have to be found. 

The generalization of the regularization methods to non-linear 
inverse problems is possible with methods like Tikhonov regular- 
ization as mentioned in section l |2.5l l or like asymptotic regular- 
ization as will be shown below (a relation between both methods 
is shown in appendix |L1i. However, the proofs of the convergence 
properties are different since the spectr al theoretical foun dation is 
missing here. We refer the reader to e.g. lO'SullivanI JT990l) . 

Let us generalize eq. ( I66t to non-linear equations of the form 

A(-0) = /, (104) 

with A being a non-linear operator, and solve the non-linear and 
non-stationary equation given by 

^+A(V) = /, (105) 
with the forward Euler method. Discretizing the solution yields 

^j+i = _^ T'T{xp'){f - A(V'')), (106) 
with T being also a non-linear operator, typically given by VA^ 



or VA ^, though more complicated expressions exist (see the 
Levenbe rg-Marquardt me thod or the regular ized Gauss-Newton 
method, Hankel ll997l or ISakushinskiil 1 19921 and iBlaschke et al.l 
ri997. respectively). 

3.2.1 Newton-Raphson method 

One of the most extended non-linear inverse methods is the so- 
called New ton-Raphson method (for an application in MEMs see 
iMaisinger et al. 1997; Hobson et al. 1998), which can easily be de- 
rived by doing a Taylor expansion of the function under study and 
truncating it at the first order 

^j+i = _^ (VA(V''))-'(/ - (107) 

This method requires the inverse of the gradient of A, which for the 
cases we are interested in is the inverse of a Hessian matrix. Re- 
calling the problem of finding extrema of a function as presented in 
section O.l.ll l and taking into account eq. l |84t , the previous equa- 
tion can be rewritten as 

^P'+' = V' " (VVQa(V''))"'VQa('/''), (108) 

where VVQ^ = dQj^/dip'dip'^ is the Hessian matrix of Qj^. 
For a direct derivation of this equation, we require a Taylor ex- 
pansion until the second order of Q\, which is where the non- 
linearity arises. The MEM can be solved (eq. I45t with expression 
1 IIO8I 1 by doing the substitutions: and Here 

the quantity is implicitly approximated by its quadratic expan- 
sion Calculating the inverse of the Hessian {VVQ 
implies solving a linear ill-posed problem in each iteration of the 
scheme l |108t . Some solutions have been found t o regularize t his 
scheme, like the Levenberg-Marquardt method (seelHankell997h or 
the regularized Gaus s-Newton method (see e.g. lBakushinskiill992l: 
lBlaschkeetai]|l997b . 



3.2.2 Landweber-Fridman method 

Alternative algorithms to the above mentioned Newton-Raphson 
class of methods do not need to invert the Hessian matrix and can 
thus simultaneously speed up and stabilize the inversion process. 
The Landweber-Fridman algorithm belongs to the class of methods 
based on steepest descent 

^j+i ^ _^ (VA(i/''))''"(/ - A(i/'')). (109) 
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Making the same substitutions as for eq. l llOSt . we obtain 



-[0,n^ - l],ry 



40, — 1] and 



3+1 



(VVQa(^'))^VQa(V')- (110) 
For a convergence analysis of this method see iHanke et alj J1995r) . 



Here just the adjoint of the Hessian must be taken ( VVQ a (V'"')) ^- 



3.2.3 Non-linear Krylov methods 

Another class of methods that do not require one to invert the Hes- 
sian matrix are the Krylov-based methods, which we have exposed 
in the previous section. The difference with respect to the linear 
case mainly resides in the calculation of the residuals and the 
step size . The residuals are updated now by the negation of the 
gradient of the quadratic form that approximates the function un- 
der consideration f-* = — VQ^(i/'"') (see ea.l84t. The step size is 
given by 



(VQa(V^)|M/x^) 



(M/x-^'lM/xJ) 



VVQ^Cl/'") 



(111) 



The derivation of this expression (see appendix IK4t is based on 
the second order Taylor expansion of Q\. That is why Krylov al- 
gorithms which use this formula are called Newton-Krylov meth- 
ods. There are alternative expressions for the time step r-* where 
the Hessian is approximated and does not need to be explicitly 
calculated, like those using a secant approximation. For various 
implementations of non-linear Krylov methods see, for example, 
[Shewchuk (.1994.) . 



3.3 Operator formalism 

The iterative methods presented so far require an operator formal- 
ism to become efficient. In this formalism, matrices should be rep- 
resented in such a way that their action can be expressed as simple 
operations, like sums and multiplications. In order to achieve this, 
one has to carefully choose the adequate representation, in which 
the individual matrix components are diagonal, though the whole 
matrix may not be. In this section, we present the different oper- 
ators under consideration (see table [3} in k-space and real-space 
and discuss their optimal representation. In this way, we can take 
advantage of the fast Fourier-transform methods (FFTs) that scale 
as n log2 n, with n being the length of the arrays, and which ulti- 
mately determine the speed of the algorithm. 



3.3.1 Fourier-transform definitions and dimensionality of the 
problem 

Let us introduce the following definitions of the A^d -dimensional 
forward and inverse Fourier-transforms just to make clear our no- 
tation 



c(fc) EE FT[T(r)] = J d'^'^rexp{ik-r)x(r), ( 



112) 



and 



f rl^DA- 

x{r) = IFT[i(fc)] = J j^-^ exp(-ife ■ r)i(fe), (1 



13) 



respectively. 

In general, the reconstruction problem has three spatial dimen- 
sions (A'^D ~ 3), with the corresponding discrete array lengths 
for the real-space and k-space vectors given by r = {rx,ry,rz) 
and k — {kx,ky,kz). Each component has the following range: 



ll,fc. = |^[0,n. - 1], 



kx 

where the volume of the Universe under consideration is given by 
V = Lx X Ly X Lz in [(Mpc/h)'^], and the box containing that 
volume is divided into n = x Uy x Uz cells, with n being the 
length of the array x. In the following, we will treat the operators 
as being continuous. However, the discrete implementation can be 
derived in a straightforward way (for a discussion on the relatio n 
between discrete and continuous representations see lMartelll2005l) . 
Note that the methods presented here can be applied in arbitrary 
dimensions. The number of dimensions TVd is thus kept as a free 
parameter. 

In our convention, vectors defined in real-space have plain no- 
tation (x) and in k-space they are denoted with hats (£). Matrices, 
however, have two hats in k-space. We represent convolutions with 
circles "o" and multiplications with dots Due to the convolu- 
tion theorem, where convolutions are shown to be multiplications 
in the counter space, we can either omit hats if they are present 
or include them if they are not, and replace circles with dots and 
vice versa "■ ^ o" to change from one representation to the other. 
All the numerical iterative inversion schemes (see section[3} of the 
different reconstruction algorithms (section |2l require only a small 
number of basic operators, listed in table ([3}. To show how the op- 
erators listed in table ([3} can efficiently be applied we derive their 
action on an arbitrary vector. 



3.3.2 Data model: the response operator and its transpose 

Let us first remember the data model given in eq. ([3}, and suppose 
that the operator Rp is given by a convolution in real-space with 
some blurring function /b 

d(r) = J d'^^r'fB(r-r')fs{r')fM{r')s{r') + fsF{r)eNir). 

(117) 

The operator R acting on an arbitrary vector {x} is thus given by 
R{a;}(r)= / d^'^r' fsir - r')fs{r')fM(r'){x(r')}. (118) 



The selection function and the masks should conveniently be mul- 
tiplied in real-space to save convolutions 

/sM(r) = /s(r)/M(r). (119) 

Accordingly, the same operation as in eq. dl 18b leads to 



R{x}(fc) = /B(fc) 



(27r)^D 



.fsMik - q){£{q)} (120) 



/smo{*} 



/b • [fsM O {x}] , 

in k-space. Here we have introduced the operator notation in which 
the equations have to be read from right to left. The braces show the 
sequence in which the subsequent operations have to be performed 
in the algorithm. The analogous operation for the adjoint R^ can be 
derived from the definition of the response operator in real space 
(see eq. ll 18b leading to 



In k-space it yields 



Rt{a;}(r) = /s(r)/M(r) / d^'^r' fsir' - r){x{r')}. (121) 



R{x}{k) = fsM o [/b ■ {x}] (fe). (122) 
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RSRt{i}(fc) = j ^_^(a(fc)a(fc')>(s,e|p){£(fc')} 



(2^)^ 

/b 





7sM(fc 


(27r)^D 


d^Dq 
(27r)^D 


/SM(fc 


d^Dq 


7sM(fc 


(27r)^D 



^B(fc) / 7!!-^/sM(fc-q)Ps(q) / ^^/smCq - fc') /B(fc') ' {^(fc')} 
J (27rj"D y (27r)"D v ^ ^ 



/b-{«} 



/smO |7b-{*}] 



^'s-[/sm°[/b-{*}]] 



/sm°[-Ps-[/sm°[/b-{*}]]] 



/b-[/smo[Ps-[/smo[/b-{x}]]]] 



(114) 



RTN-^R{i}(fc) 



(^/SM(fc-<,)/B{<j) y ^^P«-(q')(2^)^°^D(<7-<j')/B(<j')/ ^^^/sM(9'-fc'){£(fc')} 
y 7i3lN^/SM(A=-q)/B(q)PN-'(q)/B(q) y — ^/sm(9 - fc'){£{fc')} 



/smo{4} 



/b- [/smo{4}] 



J=n-1'[/b'[/sm°{*}]] 



/b-[^'n-1-[/b-[/sm°{*}]]] 



/SM ° |7^^' [Pn - 1 ■ [/b ■ [/SM0{4}] ] ] ] 



RtN^NR{*}(fc) = y (^^/sM(fc-q)/B(9) y ^^^^WN-'(9-9')/B{q') y ^^^/sm(9' -fc'){£{fc')} 

' V ' 

/SM°{*} 



(115) 



/b- [/SM°{«}] 



iVwN~^o [/b- [/smo{4}]] 



/b • [iVwN - 1 o [/b • [/sm o{4}] ] ] 



/sm°[/b' [-'VwN~^o[/b-[/smo{4}]]]] 



(116) 



Figure 2. Here the action on an arbitral^ vector x of the most complex operators that appeal' in table ^3) is shown. The upper one is required for Wiener- 
filtering and represents the signal term in the covariance matrix of the data. The middle and lower ones stand for the inverse of the ML variance (eq.[30) and 
are required for the COBE-filter, the MEMG and for sampling purposes with the Wiener-filter. The equations have to be read from right to left. The braces 
show the order in which the operations have to be done from top to bottom. One has to be very careful with the correct conjugation of the different functions. 
Note that, contrary to naiv expectations, the conjugation of the first selection function /sm to be applied in the upper operation disappears. 
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X 


X 




X 




X 


x 




MEMP 


X 


X 
















# additional operators required for the 


signal-space representation (see eg. 1571 







Table 3. Operators in columns needed for the different estimators in rows, the COBE-filter (29), the Wiener-tilter (26), the GAPMAP estimator (28), and 
the MEMs (sections 12. 5. 9I & [ 3.21 and appendix|j). Note that the trivial diagonal matrices have been left out of this table. The first two estimators are linear 
estimators, whereas the rest are non-linear. MEMG and MEMP stand for the Maximum Entropy method with a Gaussian likelihood and with a Poissonian 
likelihood, respectively. Note that some of the operators have to be further inverted either directly, like (R^NR)"^ for the COBE-filter, or in combination with 
other operators, like (RSR^ -|- N)~^ for the Wiener-filter. The methods presented in section (3) show how to do this implicitly by applying the operators in an 
iterative fashion. 



Note, that this expression can be naturally obtained by calculat- 
ing the signal term of the data-autocorrelation matrix (see the up- 
per operator in fig.[2ll. In section l |4.2.4| l we will consider a Gaus- 
sian smoothing of the signal, as could happen through an observa- 
tional process, where we test the deconvolution with our scheme. 
However, the blurring function that of main interest in the matter- 
field reconstruction is given by the mass assignment function, or 
pixel window, which describes the effect of representing point 
sources (such as galaxies) on a grid. The most popular assign- 
ment functions are the nearest grid point (NGP), the clouds-in- 
cell (CIC), and the triangul ar-shaped cloud functions (TSC) (see 
iHocknev & Eastwood|[l98ll) . 



3.3.3 Covariance matrix of the data 
The data model consists of two terms 

a(r)= J d'^^r' fB{r-r')fsM{r')s{r'), (123) 

and 

<r) = /sp(r)eN(r-). (124) 
The same quantities in k-space are given by 

a{k) = /B(fc) I -^^fsMik - q)s{q), (125) 

and 

m = I -^^fMk - q)^N(q). (126) 

Consequently, the covariance matrix of the data is given by the fol- 
lowing sum 

(d(fc)d(fcO)(s,e|p) = (a(fc)a(F))(s,e|p) + {mW)}is,e\p), 

(127) 

where we have assumed that the noise is uncorrelated to the signal, 
which is consistent with our data model. Even though the structure 
function may be correlated with the signal 

{s{k)fsF{k'))^g j: 7^ 0, the random noise part is not 

{s{k)eN{k'))^s,e\p) ~ 0. We will calculate the different terms of 
the data covariance matrix and other related operators in the next 
sections. 



3.3.4 Covariance matrix of the data: the signal term 

Here it becomes necessary to choose the Fourier representation, 
since it is there that the signal-autocorrelation matrix appears to be 
diagonal in the form of a power spectrum (eq. I128t . Taking into 
account statistical homogeneity for the signal s 

{3{k)W))<.s\p) = (27r)^°&(fc - k')Ps{k'), (128) 

with 5d being the Dirac-delta function, we can derive the expres- 
sion for the signal covariance matrix term 

(RSV) (fe, fe') = (a(fe)a(fc'))(s|p) (129) 

(' d'^^a ~ — 

= ,fB(fe) j ^^--^/sM(fc~q)Ps(q)/sM(fc'-q)/B(fe') 

= /B(fe) j j^-^fsM{k~q)Ps{q)fsM{q-k')fB{k'), 
For its action on a vector (see fig.|2), we get 

RSRt{4}(fe) = /b ■ [/SM ° [Ps ■ [fsM ° ■ {*}]]]] (fc), 

(130) 

and consequently 

SRt{*}(fe) = J |^(.?(fe)d(fc'))(s|p){*(fc')} 

= Psik) J ^^/sM(fc-fc') /B(fc') - {Hk')} 

/smO [/b-{*}] 



• [/smo [/b •{£}]]. (131) 

The inverse of the signal-autocorrelation matrix can be solved triv- 
ially in Fourier-space: 

= diag(Ps(fc)~^)- Hence, the inverse square root yields 

§-i/2 = diag(Ps(fe)-^''2). 

3.3.5 Covariance matrix of the data: the noise term 

Here, we will consider the noise covariance matrix correspond- 
ing to the definition of the likelihood. Note, that this expression is 
equivalent to the noise term in eq. M21\ if the noise structure func- 
tion has no signal dependence (see discussion in section l2.5.3b . We 
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assume, analogous to the case of the signal, statistical homogeneity 
for En 



(eN(fc)ek(fe')>{e|p) = {27r)''^5o(k - k')P^{k'), (132) 
and then derive the expression for the noise covariance matrix 
N(k,k') 



The corresponding action yields 
NwN{a;}(fe) 

~ -PwN 



fsF{q - k'){x{k')} . 



/sfo{*} 



(e(fe)e(fc')>( 

f d^"g 
J (27r)^D 

Its action on a vector yields 

d^'^k 



e|P) 



/SF° /SF°{«} 



/sF(fc - q)PN(q)/sF(q - fe').(133) 



N{x}(fc) = 



(27r)'VD 



(6(fe)^(fe'))(e|p){*(fc')} 



[/SF o [/sF o {x}]] = PwN ■ [/If o {x}] (143) 

It can be seen from this equation, that the preferential representa- 
tion now is in real-space, where N is diagonal 

NwN{r,r') = Soir - r')CwN/|F(T-'), (144) 



/d"°q f \ 7-, / \ f d °fc' f , ,i\r~r,i\-, with CwN ~ IFTfPwNl being a constant. The inverse operation 

J^fsHk-q)PMq) J j^fsHl~k){x{k)} ^^^^^^ 



/SF°{4} 



■Pn- /SF°{«} 



/SFO [Pn- [/sFo{a;}]], (134) 

In the case where there is no structure function, the noise- 
autocorrelation reduces to 



iVN(fc, fc') = (27r)'^°fo(fe - fc')PM(fc'). (135) 
Then, its action is given by 

SjN{*}(fc) = Pn • {x}{k). (136) 
The corresponding inverse operation is 

^^'{A}{k)=Pt,-'' -{xjik). (137) 
Consequently, we obtain (see fig.[2j 

RtNN'R{A}(fc) = fsM o ■ [Pn-' ■ [/b ■ [fsM o {£}]]]]{k), 

(138) 

and 

RtNN'{a;}(fe) = /sFO [^- [Pn"' •{*}]] (fe). (139) 
The inverse square root of Nn can now be calculated and leads to 

(140) 



NwN{a3}(r) = (Cwn/If)"' ■ W(r). (145) 
Hence, the inverse square root yields 

NwT{r,r') = 5u{r - r')CwN-'^^ fs^^r), (146) 
and its action in k-space reads 

AwN'{*}(fc) = Avn"'/' ■ [/s"f o {£}] (fe). (147) 
Then we get (see fig.|2j 



NN'/'(fe) = diag(PN-'/'(fe)). 

The operation R^N],^'^^{a;} can then be obtained by doing the fol 
lowing substitution Nj^^ N^^''^ in eq. ( |139l l 



RTN-''"{x}(fe) = fsF o [/b ■ [Pn-''' ■ {x}]]ik). (141) 

We are especially interested in the case of white noise (Pn = 
PwN ~ const) with a structure function (given by the Poissonian 
shot noise) 



NwNik, k') = PwN 



(27r)^D 
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/sF(fe-q)/sF(fe'-q). (142) 



RtNwN-R{A}(fe) = /sF ° [/b ■ [N^N ° [/b ■ [fsF o {£}]]]] (fe), 

(148) 

and consequently 

RtNwN{*}(fc) = /sF o [7^- [N^l, o {x}]]{k). (149) 

To calculate RtN^'j^'^{a;} one has to do the following substitution 
]^WN ^ I^wn' in eq. GlU 

RtN^N^'fiKfe) = fsF o [^ ■ [A^li' o {i}] ] (fe). (150) 

In summary, we showed that the action of the different opera- 
tors on a vector required for the different reconstruction estimators 
(see table O can be calculated in a straightforward way, as an or- 
dered series of products and convolutions. Note that whenever we 
need to perform a convolution, we change to the counter space rep- 
resentation with FFTs and do multiplication there. 



4 EFFICIENCY AND QUALITY VALIDATION OF THE 
INVERSE METHODS WITH THE WIENER-FILTER 

In this section the Wiener-filter implemented in ARGO is tested 
with the different linear inverse algorithms presented in the section 
of numerical methods {3} under several conditions determined by 
structured noise, blurring, selection function effects and window- 
ing. 

The inverse methods that we test here are the Jacobi (J), 
the Steepest Descent (SD), and several Rrylov methods, like the 
Fletcher-Reeves (FR), the Polak-Ribiere (PR), and the EXP Con- 
jugate Gradients method (see section lJ. 1.5 l and appendix I K3t . This 

In order to a void aliasing effe cts one has to adequately perform zero- 
padding (see e. 



I a void aliasing ette cts 
.g. lPress et Jll992h . 
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scheme has not been previously discussed in the literature and turns 
out to be very efficient as will be discussed below. Many other 
Krylov methods (see table |2} can be built from simple equivalence 
relations, as we show in appendix [K] However, only the methods 
mentioned above are taken into account here, as we consider them 
to be sufficiently representative. The extra-regularization we pro- 
pose with these Krylov methods converts the Wiener-filtering in a 
hybrid Tikhonov-Krylov space regularization method. In addition, 
we also test the Wiener-filter that uses hermitian redundancy as de- 
rived in appendix IB] We call the Wiener-filter defined by the map- 
ping equation l lB6t the conjugated Wiener-filter (CJ), whereas the 
Wiener-filter defined by eq. ( |B8t has no extra suffix. 

With the aim of having full control over the synthetic data, 
we ge nerate Gaussian random field^3 with the IPeacock & Doddsl 
( 1 19940 formula for the power spectrum. The resulting real den- 
sity field is denoted by 5roai = 5p, and the reconstruction by 
5roc = V"- The signals are discretized and arranged as vectors given 
by [k + riz x (j + ny x i)], where i G [0, — 1], j £ [0, Uy — 1], 
and k G [0,nz — 1]. The algorithmic part of the reconstruction 
methods shown in section l|3) does not change with the dimension- 
ality, but solely the length of the vectors given by n = n^; x x 
change and thus also the dimension of the involved matrices. The 
formulation of the matrices is explained in detail in section l |3.3l >. 
The Fourier transforms must be accordingly called with the dimen- 
sions under consideration, which occurs in ARGO by switching be- 
tween the different FFTs given by FFTW0. In addition, the power 
spectrum that is used for the reconstruction has to be set up with the 
corresponding length and the data have to be correctly rearranged 
to their original dimensions ^ [k + Uz x (j + Uy x i)]) 

after their manipulation. 



4.1 One-dimensional example 

We can see in fig. (O an example of a Gaussian realization in one- 
dimension (red curve) that can represent a time-line. A structured 
noise that increases with the distance (/spfr) oc r) and with a ran- 
dom noise component (evvN = G(0, 1 r^f) was added to the sig- 
nal. Finally a region was excluded simulating windowing effects. 
The resulting curve was taken as the input signal (yellow curve). 
The reconstruction given by ARGO is in blue and green, where 
the boundary effects were considered in the first case, but not in 
the second. There the signal was assumed to be zero in the UN- 
sampled region. We can see that the blue curve better resembles the 
real signal guided by the trend at the boundary. This effect is much 
larger in multiple dimensions as is shown in the next section. In the 
right plot in fig. (O, two sampling processes are underlying the yel- 
low signal. First, the Gaussian random field that generates the red 
signal, which is then Poisson sampled thus leading to the yellow 
data. Again the blue and the green curves represent the reconstruc- 
tions with and without proper window treatment, respectively. In 
this case, the blue curve also approaches the true signal better. 



4.2 Multi-dimensional test cases 

ARGO has been implemented such that the global dimension 
A^'d (see section 13.3. It , and even the length in each dimension 
(Ux, Uy, Uz), can be chosen arbitrarily. Our tests in one-, two- and 
three dimensions show that the results do not differ qualitatively. 
The convergence behavior changes with the length of the arrays 
(n = Tlx XHyXUz) as n logj n fully determined by the FFTs, as we 
showed in section l|3j. For the demonstration cases in this paper, we 
have selected the two-dimensional tests with 128 x 128 = 16384 
pixels. However, three dimensional tests were also carried out lead- 
ing to the same conclusions. 

4.2.1 Qualitative and quantitative measurement of the quality of 
the reconstruction 

To give a quantitative measurement of the quality of the reconstruc- 
tions, we define the correlation coefficient r between the recon- 
structed and the real density field by 

This statistical quantity is not very sensitive to the overall distribu- 
tion and yields good values (close to unity) in some cases even with 
poor reconstructions (see section l4.2.5t . The pixel to pixel plot of 
the real density field against the reconstruction is highly informa- 
tive because the scatter in the alignment of the pixels around the 
line of perfect correlation (45° slope) gives a qualitative goodness 
of the reconstruction. In general, the quality of the recovered den- 
sity map is better represented by the Euclidean distance between 
the real and the reconstructed signals. The ensemble average of this 
quantity can also be regarded as an action or loss function that leads 
to the Wiener-filter through minimization (see appendix |B]l. Here 
we introduce the volume-averaged squared Euclidean distancj^ 

DL,i(V',5p) = ^ j A^'^r [^(r) - 5,{r)\\ (152) 

with V = Lx X Ly X Lz. We further normalize the Euclidean 
distance through the following definition 

i^EuclW, Opj = J-i JTi (iJ^) 

Deuci(V'o,5p) 

where %po is the zero vector. We define the convergence tolerance 
criterion based on the squared Euclidean distance between subse- 
quent reconstructions 

toli+: =dL„(V'^+\V^). (154) 

We prefer this criterion with respect to the squared residuals 1 1^| P 
(see eq. |79t because all the tests show that no further statistical 
quality improvement in the reconstructions is reached after toP^|;J, 
as can be inferred from the correlation coefficients r and the nor- 
malized squared Euclidean distances ^p)- 



20 We use GARFIELDS: GAussian Random FIELDS, a program we 
developed to generate Gaussian random fields from a given power spectrum. 
The method can be found in detail in lMartell ^20051) . 

21 ppxw is a C subroutine library for computing fast discrete Fourier 
transforms in one or more dimensions of arbitrary input size and of both 
real and complex data: http://www.fftw.org/ 

22 0(0, 1): zero mean and variance 1. 



4.2.2 Numerical performance with and without preconditioning 

Here we analyze the convergence behavior of the different inverse 
schemes with and without preconditioning. We start by considering 
a Gaussian random field with some structured noise that increases 

23 Note that D2^^j(V>,5p) = ii'L.iCV, <5p). 
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Figure 3. ID Reconstruction with structured noise & window: The left plot shows the reconstruction of a one-dimensional noisy signal. The red curve is the 
true underlying signal. The yellow lines represent the measured data in each grid cell. The data are windowed by a function given by the black line. A random 
noise with a structure function that increases with the distance with respect to the origin has been added to the true signal. The green and the blue lines show 
different reconstructions. In the blue case the windowing is formally treated, whereas in the green case the unseen region is modeled by a mean signal, which 
is zero in this case. We see that the unsampled region is estimated by the blue curve better than by the green curve, where the edge effects were neglected. 
The proper treatment of edge-effects gives even better results in the sampled regions close to the the borders of the unsampled regions. This improvement can 
cleai'ly be seen in section j4.3) . Poisson noise: In the right plot, two sampling processes are underlying the yellow signal. First the Gaussian random field that 
generates the red signal, which is then Poisson sampled leading to the yellow data. Again, the blue and the green curves represent the reconstructions with and 
without proper window treatment, respectively. 



radially and is modulated by a random noise component. As a pre- 
conditioning expression, the diagonal part of the data covariance 
matrix is chosen, which is given by the sum of 

(RSRt)(fe,fc) = 

/B(fc) J ^^^/sM(fe-q)Ps(q)/sM(q-fe)/B(fe) 

= PB{k) I -^^PsMik - q)Ps{q) (155) 

^ V ' 

Pb ■ [PsMoPs], 

and 

= I ^^^/sp(fe-q)PN(q)/sF(q-fe) 
PSF O Pn, 

where we have used the following definitions: Pb = ||/b|P, 
PsM = ||/sm||^ and Psf = ||/sf||^- We can thus calculate the 
preconditioning matrix M required for the different schemes (sec- 
tion[3} by just inverting each diagonal component. The results sum- 
marized in figs. (l4]and[5j show important differences between the 
reconstructions done with (on the left side of fig.|5} and without (on 
the right side of fig.O preconditioning. Some of the methods just 
speed up, like the various EXP methods or the SD scheme. Oth- 
ers, however, are stabilized and manage to converge to the solution 
only after preconditioning, like the J, the FR and the CPR methods. 



Without preconditioning, the latter converges extremely quickly to 
a wrong solution. This is due to the fact that we did not impose the 
follo wing stabilizatio n: /3pr = max(/3pR,0) in this calculation 
(see iShewchuklll994 for a discussion). However, our tests show 
that upon imposing this stabilization the PR-method becomes sig- 
nificantly slower than the rest. On the other hand, the EXP-Krylov 
methods behave most stably and converge very quickly. In the pre- 
conditioned case, we see that all methods converge to the same 
statistical result, as we can infer from the correlation coefficient r 
and I'euci(^i ^p)' except for the PR scheme that yields slightly less 
optimal results (see the green line in comparison to the rest in panel 
c). We have tested preconditioning in the rest of the examples and 
could confirm the results presented in this section. Precondition- 
ing turns out to be necessary to achieve fast algorithms. In the next 
subsections we present results with a Poissonian distribution (fig.O 
and with blurring (fig.|7l(. Their corresponding numerical efficiency 
tests are shown in fig. {S}. The same kind of studies are done with 
a simulated selection function (fig. ^ and with a mask (fig. lIOI l. 
Their respective numerical behaviour can be seen in fig. dl It . 



4.2.3 Poissonian distribution 

In this study case, we investigate the reconstruction of a Gaussian 
field based on a Poissonian distribution. This model is far from 
reality, where much more complex processes are known to occur 
(see discussion in section |2. 1.1 1 ). However, we can model a non- 
Gaussian process in this way and test how good the Wiener-filter 
reconstruction works under such circumstances. Here the assumed 
data model does not coincide with the one that has generated the 
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Figure 4. Structured noise treatment: The upper left picture shows the real signal. The upper right picture is the input signal, where some random noise 
that increases radially was added. Note that the scale of the colourbar changes from a maximum overdensity of 20 to 70. The lower left picture c shows the 
reconstruction. The reconstructions using different numerical methods implemented in ARGO are indistinguishable. In the lower right image d, the real density 
field is plotted against the reconstructed density field pixel by pixel without any smoothing. The numerical peii'ormance of this reconstrcution case is shown 
in the next figure. 



data. However, the Poissonian noise can be modeled in the noise 
matrix of the Wiener-filtering through the structure function /s . 

The results presented in fig. l|6) show very good agreement 
between the reconstruction and the real underlying density field 
(compare panels a and c). The convergence behaviour and statisti- 
cal goodness is plotted in the left side of fig. ([8}, panels a, c and 
e. There we can see that the FR and PR methods do not converge 
rapidly (see yellow and green curves in panel a). On the contrary, 
the J, SD, and EXP schemes are very efficient (panel c) and lead to 
very similar results (panels c and e). 



eqs. ( fT30l ) and i fmT l. and yields the fi gure shown in panel c. We 
see how much of the small scale structure is restored and the peaks 
become enhanced. The correlation between this reconstruction and 
the original signal (panel e) is significantly better than for the case 
where the blurring is ignored (panel f). We can see in fig. {Sj that 
the deconvolution algorithm is very fast for all the methods except 
for the FR-scheme. The PR-method is the fastest, but it leads to 
slightly worse results (see the green curve in panels c and e). The 
EXP turns out to be more efficient than the J and SD methods in 
this case. 



4.2.4 Blurring ejfects: deconvolution 

In this numerical experiment we tested the blurring effects by con- 
volving the density field with a Gaussian. The result is shown in 
fig. l|7]l, panel b. We see how the small structures are smoothed out 
and only the larger ones prevail. Some noise with a structure func- 
tion was added to the signal. However, the noise was kept low with 
the aim of investigating primarily the blurring effect. The results 
of the reconstruction that considers only the noise does not change 
much with respect to the input signal, as can be expected. However, 
the extra-regularized Wiener-filtering deblurs the image applying 



4.2.5 Selection function ejfects 

For this case we use a modified data model in which the selection 
function also affects the noise 



d= fs ■ (s + fsF ■ ewn), 



(157) 



with fs G [0, 1], simulating the fading strength of the signal with 
increasing distance. The results are plotted in fig. {9}, where the 
structure of the signal can be seen to become indistinguishable in 
radial direction (see panel b). Taking only the noise into account 
leads to very poor reconstructions (see panel d). On the contrary. 
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Figure 5. Numerical performance with and witliout preconditioning: Here the convergence behaviour and the goodness of the reconstructions using 
different inversion algorithms can be seen. The pictures on the left show the methods using preconditioning, whereas the pictures on the right do not use 
preconditioning. The upper plots show the squared Euclidean distance between succesive reconstructions. The plots in the middle show the normalized 
Euclidean distance between the different reconstructions and the true signal. The lower plots show the evolution of the statistical congelation coefficient 
between reconstruction and signal. We see from panel c and panel e that after less than 10 iterations the reconstructions do not significantly improve with most 
of the inversion algorithms. The different inversion algorithms used are: Jacobi (J), Steepest Descent (SD), Conjugate Gradients (CG), Fletcher Reeves (FR), 
and Polak Ribiere (PR). We also tested a more expensive variant that uses one additional operation of the involved matrix (EXP) and one other variant (CJ), 
where a degree of freedom in the mapping equation for the Wiener-filter is used. 
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Figure 6. Poissonian noise: Here two stochastic processes are underlying tlie input signal. First the Gaussian random field that generates the signal in panel 
a, which is then Poisson sampled leading to the signal in panel b. The reconstruction in panel c is shown to be in good agreement with the underlying signal. 
The pixel values are correctly distributed as can be seen in panel d. 



by also considering the selection function effects, the structures are 
resolved even at contours where only 10 % of the signal plus noise 
is left (see panel c). As can be appreciated in panels e and f there 
is an improvement in the correlation between the real density field 
and the reconstructed signal. Panel e shows a higher correlation co- 
efficient, but the quality enhancement of the reconstruction can be 
seen better in the distribution of the density values for each pixel. 
How the points are correctly spread along the diagonal line can 
be verified there. The longer Euclidean distance to the real density 
field shows the quantitative difference very clearly, by just compar- 
ing the pink curve with the rest (fig. [TT] and panel c). It is worth 
mentioning that although the PR test seems to give a comparable 
result to the calculation that ignores the selection function. The fi- 
nal correlation coefficient in panel e shows that the reconstructions 
actually strongly differ and panel c shows that the quality of the 
recovered signal is notably better for the former experiment. 

In addition, we tested the same selection function affecting 
only the underlying signal with a model given by 

d = /s ■ s + /sF ■ ewN, (158) 
and obtained the same qualitative results. 



4.3 Windowing effects 

In this section we investigate the mask effects that introduce cou- 
pling between different modes in Fourier-space so that the data co- 
variance matrix is no longer diagonal. The input signal is given in 
panel b of fig. JIOI ). The noisy signal from panel b in fig. (O was 
cut in stripes to simulate observed regions. We compare two re- 
constructions here, the first one ignores windowing effects given 
in panel d and a second reconstruction employs the proper treat- 
ment of the boundary through /m in the algorithm (see eqs. |130| and 
|131t . The statistical correlation is given in panels e and f, respec- 
tively. Our experiments show better results not only for the latter 
reconstruction in the un-sampled region (fi), represented by the red 
dots in panels e and f in fig. l llOt . but also in the sampled regions 
(S7). The global correlation r is significantly improved. Whereas 
the distribution of the black dots, the values of the densities in the 
observed regions, does not apparently change, the distribution of 
the un-sampled red dots clearly does. These are distributed around 
the zero value for the case where windowing is ignored because a 
zero signal is assumed by ARGO in the SI region. In contrast we 
see that the red dots are distributed along the diagonal line when 
edge effects are considered. This is equivalent to a propagation of 
the information to the un-sampled regions or the appropriate in- 
terpolation and extrapolation of signals. Looking at the numerical 
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Figure 7. Blurring treatment: Here the signal (panel a) was convolved with a gaussian modeling blumng effects, as shown in panel b. Some low noise with a 
structure function was added. Panel c shows the deblurred result. Panel d takes only the noise into account. We see in panel f the correlation between the input 
signal and the true signal, because the noise is negligible. The correlation coefficient is thus very high, however, the alignment of the pixels in the plot is not 
correct. Overdensities and underdensities tend to be underestimated, which is consistent with the blumng effect. The reconstruction given in panel e corrects 
this effect and consequently a higher correlation coefficient is achieved. 
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Figure 8. Poissonian noise and numerical performance (panels a, c, e): Here the convergence behaviour and quahty of the reconstruction is comparable 
for the J, SD, EXP methods. The FR and PR schemes do not present a fast convergence (panel a). Nevertheless, the FR scheme (yellow curve) seems to lead 
to the correct solution (panels c and e). The PR formula, on the contrary, stagnates at reconstructions that have much lower quality compared to the rest of the 
schemes. Blurring treatment and numerical performance (panels b, d, f): In this study case, the EXP algorithm seems to work better than the rest of the 
schemes. Although the PR formula converges very rapidly (green curve in panel b), it leads to a lower quality reconstruction (panels d and f). The FR scheme 
converges to the same solution as the J, SD, and EXP algorithms, however, with a slower convergence (yellow curve in panel b). The J and SD methods have 
an overall good behaviour in this case, but still converge significantly slower than the EXP scheme (their convergence is identical black and red curves are 
oveiplotted). The reconstruction considering just the noise is very poor, because the noise is negligible in this case (pink curves). 
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Figure 9. Selection function treatment: Here selection function effects were simulated with a function that takes values between zero and one. decreasing 
exponentially in radial direction. The contours show different values of this function. Panel a shows the real density field. Panel b shows the input data, 
where the true signal was multiplied in real space with the selection function and a radially increasing noise was added. The reconstruction and its correlation 
with the true signal are represented in panel c and e, respectively. The reconstruction ignoring selection effects by taking only the noise into account leads 
to panels d and f. The reconstruction given in panel d is very conservative and smooths the overdensities out due to noise supression. This leads to a high 
correlation coefficient, though the individual pixels are clearly not correctly aligned (panel f). Panel c, on the contrary, shows more structures that are enhanced 
due to consideration of the selection function effects. This correctly distributes the pixels, as can be seen in panel e. The correlation coefficient seems to be 
significantly better than in panel f, however, a better measure of the overall quality of the reconstruction can be seen in next figure. 
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Figure 10. Windowing treatment: Here the edge effects are shown in two dimensions. The true signal was multiphed by a windowing function that is one 
in the observed region (Q) and zero in the unknown region (H). The sampled regions are given by the veilical stripes. In addition, a radially increasing noise 
was added (see panel b). Panel c shows the reconstruction handling the edge effects. Panel d represents the result taking only the noise into account. We see in 
panel c how the information is propagated into the unsampled regions leading to a closer resemblance of the real signal, whereas the noise is just suppressed 
in panel d. Panels e and f show the correlation coefficients for the whole reconstructed region, split into the sampled (black dots) and the unsampled regions 
(red dots). Note that the red dots are strongly aligned around the zero value in panel f, whereas they are con'ectly spread in panel e, statistically representing 
the information propagation process mentioned above. 
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Figure 11. Selection function treatment and numerical performance (panels a, c, e): The same color coding is used as in fig. (Sj panel a, except for 
additional curve (represented in pink) that indicates the reconstruction in which the selection effects are ignored. Panel a shows the squared Euclidean distance 
between subsequent reconstructions. The squared Euclidean distance between the reconstruction and the true density field is plotted in panel c, showing a huge 
difference between the reconstruction which takes only the noise into account and ignores the selection function and the rest of the methods. Note that the 
statistical con'elation r is also much better for the case where the selection effects are properly treated (panel e). One concludes from the thi'ee plots, that the 
SD and EXP methods (red, blue and violet curves) clearly converge faster to a more or equally optimal solution in comparison with the rest of the methods. 
The J scheme shows a significantly slower convergence (black curve in panel a). The PR algorithm stagnates at poorer reconstructions as can be seen from 
panel c and e. Windowing treatment and numerical performance (panels b, d, f): In this case, the PR shows extremely good results: fast convergence 
(panel b) and a high con'elation coeficient (panel d). However, the Euclidean distance is slightly bigger than for the rest of the methods, except for the pink 
curve (ignoring windowing effects). The FR method is disastrous in this study case and diverges from the solution as can be seen in panel f. The J, SD, and 
EXP methods show good and stable results. The J and SD algorithms give extremely similar results. Although their convergence behaviour is similar to the 
EXP schemes, the latter give slightly better results: smaller values for the Euclidean distance and higher values for the coiTelation coefficient (violet curves in 
panels d and f, respectively). 
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performance in fig. Jl lb reveals that most of the methods behave 
very similarly, except for the PR and FR schemes that deviate from 
the rest. The former converges rapidly to a good solution that has a 
higher correlation (see green curve in panel f), but a slightly worse 
Euclidean distance to the true signal. The FR on the other hand 
converges extremely slowly. The correlation coefficient is at a stage 
where it becomes dramatically worse (see yellow curve in panel f). 
The smaller Euclidean distance is no measure for the quality in this 
case because these low values can be achieved when the recon- 
struction is very conservative (closer to zero) and has no structure. 
Notice how many schemes start with better values for that distance 
measure (see panel d). The EXP methods converge faster and the 
CJ version leads to even slightly better results (see violet curve in 
panels d and f). 

It is also worth mentioning that the best reconstructions in 
terms of high correlation coefficients and low Euclidean distances 
to the underlying signal are achieved only after three iterations for 
the J, SD, and EXP methods, prior to numerical convergence. We 
furthermore tested ARGO under extreme noise conditions in which 
the inversion diverges and produces density values that approach 
infinity. At early iterations, extremely good reconstructions were 
produced. These examples underline the regularization character of 
the inversion schemes under consideration in this paper. However, 
for the cases we are interested in, where the noise is mainly deter- 
mined by the discrete sampling of galaxies, no additional stopping 
rules are required and the inversion algorithms can be run until full 
convergence. 



5 SUMMARY AND CONCLUSIONS 

The goal of this work is to exploit the Bayesian formalism to 
develop methods that reconstruct the underlying dark-matter dis- 
tribution from the discrete sample of galaxies and their three- 
dimensional positions provided by galaxy redshift surveys. Such 
a general Bayesian analysis permits one to innovate methods and 
push this field forward to develop more accurate reconstruction al- 
gorithms. 

We show how a series of uncertainties demand a statistical ap- 
proach (see figure[T]and section fTTTt . Some of the uncertainties are 
intrinsic to the nature of the underlying signal (the dark matter) and 
have a stochastic character, the cosmic variance. Other uncertain- 
ties are intrinsic to the nature of the observable (the galaxies) and 
lead to a kind of shot noise, galaxy-bias and redshift-distortions. 
Additional uncertainties, such as windowing, selection function ef- 
fects and blurring effects, arise due to the observation process. The 
degeneracies that are produced by such uncertainties require regu- 
larization techniques, which should converge to optimal solutions. 
We discuss the different Bayesian approaches specified through dif- 
ferent options for the likelihood and the prior, and see how natural 
regularizations can be performed by the prior-choice (see section 
12.5b . Moreover, we see how the definition of particular likelihoods 
and priors define classes of algorithms, each specific to a different 
problem approach (see tablelTJ. 

We develop new algorithms in this Bayesian framework which 
account for the discrete nature of a galaxy distribution by taking a 
Poissonian likelihood. This is done for the case of a Gaussian prior 
leading to the GAPMAP estimator (see section |?.5.4| and appendix 
lEt and for the case of an entropic prior (see section [2.5. 91 and ap- 
pendix|jj. The Maximum Entropy method is studied in detail as a 
non-informative prior, which does not assume a particular pattern 
for the underlying signal. This can be interesting when searching 



for intrinsic deviations from Gaussianity (see section [2.5.9| and ref- 
erences therein). 

We extend the Wiener-filter (see section |2!5.3| and appendixiBt 
and propose novel algorithms to do a joint estimation of the density 
field, its power-spectrum, and the peculiar velocities of the galaxies 
(see section |T6t . We also address the possibility of extending such 
work to determine cosmological parameters and the bias between 
galaxies and dark matter. 

Such an aim requires a large number of repeated reconstruc- 
tions, which can be only achieved with highly efficient inverse al- 
gorithms. We develop here the necessary numerical schemes in a 
preconditioned way for linear and non-linear inverse problems (see 
section |3] and appendix iKl & iMt. Such iterative schemes acquire 
their real power only in an operator formalism, which we derive 
in detail for different Bayesian methods (see section [373] l. A novel 
Kiylov formula (see section [3.1.5| and appendix [Kj turns out to be 
superior in terms of performance and fidelity, as we show in section 
0. 

The novel ARGO-software package is presented in this paper. 
Different inverse schemes are tested with the Wiener-filter imple- 
mented in ARGO under several conditions determined by struc- 
tured noise, blurring, selection function effects and windowing (see 
section|4ll. 

We conclude that fast three-dimensional reconstructions of the 
large-scale structure scaling as n logj n (with n being the total 
number of grid cells) can be done with hybrid Wiener-Krylov iter- 
ative schemes under an operator formalism, which takes advantage 
of the speed of FFTs. This opens new horizons of possibilities, such 
as joint parameter and signal estimation, in the field of large-scale 
structure reconstruction. 

It is our goal to apply such techniques to reconstruct the under- 
lying density field, the power- spectrum and the peculiar velocities 
from galaxy surveys. Still, different problems, such as galaxy-bias 
studies, have to be further analysed. However, we are confident that 
such issues can be tackled from an information-theory approach. 
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APPENDIX A: THE WIENER-FILTER AS A BAYESIAN 
ESTIMATOR 

Let us recall eq. l l21t which comes from Bayes theorem assuming 
a Gaussian prior and a Gaussian likelihood 

P{s I d,p) 

oc exp ^-i pS"^s + (d - Rs)^N"^(d - Rs)j^ . (Al) 

If we just look at the log-posterior distribution we have 
logP(s I d,p) ocs1"s-is + (d-Rs)'''N"^(d-Rs) 



Assuming a linear relation between the reconstruction ^ and the 
data d 



(B4) 



and statistical homogeneity ({s(fc)s(fc'))(s.e|p) 
(27r)^D(5D(fc - k')Ps{k')), yields 



A = 



(A2) 



F{k,k' 



We can combine the first two terms to one term: s^(cr^p) ^s, 
with (ctwf)^^ = (S^^ + R^N^^R). Since we want to obtain a 
log-posterior of the form 

logP(s I d,p) (X [s - (s)wp)^(o'wp^)~''(s — (s)wp), (A3) 

with {s)wF = Fwpd, we can identify the third and the fourth term 
of eq. l |A3l > with the corresponding terms in eq. iA3\ 



(27r)'VD 

d^Dq „ ^ 

^^--p^F(fc, q){d{k')d{q))^s,e\p) 



and 



- d^N-iRs = -d^Flj,{<T'ijp)-^s, 



(A4) 



(A5) 



respectively. The remaining term depends only on the data and is 
thus factorized in the posterior distribution function as part of the 
evidence. From both eq. JA4b and eq. l lA5b we conclude that the 
Wiener-filter has the form 

FwP = CTwpR'''N"^ = (S^^ + R'''N"^R)"^R'''N"\ (A6) 

This is the natural Bayesian representation in contrast to expression 
( |26t , which is the outcome of a generalized LSQ approach (see ap- 
pendixlBland discussion in section [T.5.3b . It can be shown that both 
expressions for the Wiener-filter are mathematically equivalent (see 
appendix let. 



APPENDIX B: THE MAPPING EQUATION FOR THE 
WIENER-FILTER IN K-SPACE 

Following the concept of m inimum variance (e.g. iRvbicki & Press! 
I1992I : IZaroubi et alj|l995h , we define an action given by the nor- 
malized volume integral of the square of the difference between 
the reconstruction (if)) and the ensemble of different possible real- 
izations of the density field (s — Sp) 



(Bl) 



From the statistical point of view, the action A is the loss function 
that has to be minimized. Note that this action can be expressed as 
the ensemble average of the squared Euclidean distance between 
the real density field a and the reconstruction t/j 



1 2 

-4 = :^{OEucl(^,s)>(S,e|p)- 

Transforming expression dBlt into Fourier space yields 



(B2) 



[wkwk)}(s,e\p) + (mm) 



- (i/i(fe)s(fc)>(s,e|p) - (s(fe)V)(fe))(s^g|p) 
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+{2^r°5D{k - k'){s{k')s{k'))^s,e\p) 

-F{k,k'){d{k')l(k))(s,e\p) 

-P{k,k'){s{k)W))is.e\p)\ 



(B5) 



Now the action is minimized with respect to the linear operator, 
^ = 0, to obtain the following mapping equation 



(27r)'VD 



F{k,q){d{q)d{k'))(s,e\p) = {md{k')) 



(s,e\p)- 



(B6) 

The desired filter can be thus expressed as the correlation matrix 
between the signal and the data multipli ed by the invers e of the 
autocorrelation matrix of the data (see Zar oubi et alj|l99^ 



F= {sd'^){dd'^y 



(B7) 



This filter is the LSQ estimator (see eq. [24). It is identical to the 
Wiener-filter in case the noise term has no signal-dependent struc- 
ture function or after applying an ensemble average over all pos- 
sible signals on the noise covariance matrix. Note, that eq. l |B6b 
allows us to substitute k' by — fc', which is equivalent to the con- 
jugation of d{k') due to the hermitian redundancy of real numbers 

(B8) 

The linear operator one obtains in this way is different, but fulfills 
the same requirements. We compare both cases in section Let 
us see how one would apply such a filter. The covariance matrix of 
the data is given by 

(d(fc)d(fc'))(s,e|p) = (a(fc)Q(fc'))(s,e|p) + (e(fc)e(fc')>{s,e|p) , 

(B9) 

and its action on some vector by 



/ 



(a(fc)a(fe'))(s,e|p){i-(fe')} 



(27r)'VD 

= /b ■ [fsM O [Ps ■ [fsM O [/b • {*}] 



(BIO) 



and 



J 



{e(fe)e(fe')>{s,e|p){£(fe')} = /SF ° [Pn ■ [/sF o {x}]] (fe). 



(BU) 

The correlation matrix between the data and the signal applied to 
that vector yields 



J g^{S(fc)d(fc'))(s,e|p){*(fc')} = Ps ■ [fsM o [/b • {£}]]{k). 



{s,e\p) (B12) 
We see that the difference with respect to the operations derived in 
(B3) section l l3.3t resides in the conjugation of certain functions. 
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APPENDIX C: DATA-SPACE AND SIGNAL-SPACE 
REPRESENTATIONS FOR THE WIENER-FILTER 

Here we show the equivalence between the data-space and 
the signal-space representations for the Wiener-filter (see sec- 
tion l2.5.3l ). In a first approach, we start assuming that the inverse of 
the response operator exists (R~^). Then after some operations the 
equivalence can be shown for both the Wiener-filter 



(S^^ +R'''N"^R)~^R'''N"\ 
(S-'(R)''N + Rt)-\ 
SR''"(RSR''' +N)"\ 



(CI) 



and the covariance 



2 

"■WF 



= (S"' + R'''N"'R)"\ 

= SR''"(R + RSR''"N"^R)"\ 

= SR''"(RSR"'" +N)"'N(R^)"\ 



(C2) 



Note that the covariance given by eq. l |C2t has limited practical 
use, since it requires the inverse of the response operator R, which 
is in general a singular matrix. To find a data-space representation 
for the covariance one has to introduce the concept of constrained 
realizations (see section [2.6. 21 and appendix |D]l. In order to find a 
general proof for the equivalence between the data-space and the 
signal-space representation of the Wiener-filter, we have to look at 
the residuals 

o-WF = {rr'') = ((s - Fwpd)(s - Fwpd)^} (C3) 
= S - SR+FwF^ - FwfRS + Fwf(RSR''" + N)FwF^ 

where we have done the substitution: d — Rs + e and (se^) = 0. 
The first two terms lead to the Wiener covariance, as we show here 



(S(crwF) — SR^FwF^(o'WF^) ^)o'wf 



(S(S"' + R^N^'R) - SRfN-'R)(TW 



2 

WF, 



(C4) 



where we have used the signal-space relation obtained in sec- 
tion JA^: Fwf ~ cr^pR^N^^. Consequently, the last two terms 
of eq. l |C4| ( have to cancel out 



= —FwfRS + Fwf (RSR"!" + N)F V 
= Fwf (-RS + (RSR"!" + N)F WF , 



(C5) 



Now we take the transpose and conjugate of the last equation and 
factorize the data correlation matrix out (which is always invertible, 
since the noise covariance matrix is invertible) 



= (Fwf - SR'''(RSR''" + N)"')(RSR''' + N)Fty 



(C6) 



The last equation motivates the data-space representation of the 
Wiener-filter without performing least squares, i.e. without de- 
manding the Filter to be optimal (9(Twf /9Fwf = 0), which is 
already imposing some regularity condition on Fwf. Note that we 
also obtain the trivial zero solution (Fwf = 0), which is equivalent 
to R = or N = oo with covariance cr^ = S. Since the data- 
space and the signal-space representation have the same null-spaces 
eq. l |C6l ) already proves the equivalence between the data-space and 
the signal-space representations for the Wiener-filter. Nevertheless, 



let us directly test this equivalence 

SR'''(RSR''' +N)"' = o-^fR'''N"' 

SR"!" = cr|vFR'''N"'(RSR''' +N) 

RS = (RSR"!" +N)N"^Rctwf 

RS = RSR'''N~'Ro-^p +Ro-^F 

RS(ctwf)"^ = RSR'''N"^R + R 

RS(S"^ + R'l'N^^R) = RSR'''N"^R + R 



R + RSR^N R = RSRN R + R 



tM-ll 



(C7) 



Since the left-hand-side is equal to the right-hand-side both repre- 
sentations are equivalent. Note that we did not assume the response 
operator to be invertible. We solely demanded that the inverse of 
the signal and of the noise covariance matrices can be built (3S~^ 
and 3N~^). This implies that the covariance matrix and the inverse 
of the data autocorrelation matrix exist (3(S~^ + R^N^^R)^^ and 
3(RSR'I' +N)-'), as we required in our proof. 



APPENDIX D: COVARIANCE OF A CONSTRAINED 
REALIZATION 



Following iHoffmari & Rib^ ( Il99lh : lOanon & Hoffman! ( Il993h : 
iBistolas & Hoffman! ( Il99i) ^ we can generate a synthetic realization 
with 

y = s - Fwpd, (Dl) 

If the following relations holcPI: (ss^) = S, (ee^) = N and 
(se^) = then we obtain 

(yy^) = ((S - FwFd)(s - FwFd)^) (D2) 
= S - SR F WF^ — FwfRS + Fwf (RSR''' +N)Fwf''' 

We can identify these terms with eq. l |C4| l. Thus, following relation 
is fulfilled 



{yy'') = (rr'') = fTwF- 



(D3) 



APPENDIX E: GAPMAP: MAP WITH A GAUSSIAN 
PRIOR AND A POISSONIAN LIKELIHOOD 

Remember P(s \ d,p) oc £{d \ s,p)P{s \ p) to be extremized. 
First we write the log-likelihood taking the logarithm of eq. il2\ 



log £(s I d, p) = ^ 1^ - (Rs')i ~ d + d'i log (^(Rs')i + d 

i 

-log(d^!)]. (El) 
Then we differentiate with respect to the signal to yield 
dlog C{s \ d,p) 



dsk 



[R^kbni(^-l + {Y,R^Js'J+C^)-^d'^y 



The same exercise for the Gaussian prior leads to 
9 log P(s I p) 



dsk 



(E2) 



Note that the realization does not need to be Gaussian distributed, but 
just fulfill these requirements. 
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Now we demand = 9 log P{s \ d, p) /dsk to get an equation for 
the MAP estimator. After applying S to the equation we obtain 



(E3) 



Adding the index j + 1 and j to s on Ihs and rhs respectively, an 
iteration scheme is formed 



(E4) 



^SkiRiibUg ( - 1 + ( X] -Ri,„ng(l + &s^) + Cj^ d'i 



Let us simplify this algorithm for positive signals s' in matrix no- 
tation 

^0+1 ^ _,2gjjt 1^ _ 1 + diag(Rs'^' + c)-'d'j + s', (E5) 

where we made following substitutions 6^1 and rTg ^ s' , with 
s' being the average of the positive signal. 



APPENDIX F: POISSONIAN MAXIMUM LIKELIHOOD 

The context in which the Richardson-Lucy algorithm is applied has 
positive intensity signals and the kernel R in eq. l[T) is understood 
as a blurring function that can be expressed mathematically as a 
convolution with the true signal s. We will further assume no back- 
ground (c = 0) so that the log-likelihood of eq. l ll2t can be written 
as 

.g£(s' I d',p) = ^ [ - (Rs'). + d:iog(Rs'). - log(d:! 



lo: 

differentiating with respect to the signal yields 
91og£(s' \d',p 



(Fl) 







ds'k 



J2 [R:k(-l + {Rs')r'd\)]. (F2) 



We can multiply this equation with the signal s' and make an iter- 
ative method which coincides with Richardson-Lucy algorithm 



(F3) 



s'^+' = diag(^Rtdiag(Rs'^)-'d'js'^ 
with R^ 1 = 1 due to the convolution operation. 



APPENDIX G: COBE-FILTER 

We briefly show here that the COBE-filter is an unbiased estimator 
only and only if the response matrix is invertible. 

((s)coBE>(d|^,p) = {{R'N-'R)-'R'N-'d)^ais.p) 

= (RtN-^R)-^RtN-^(Rs + e)(rfl^^p^ 
= (R+N"^R)"^R"''N"^Rs 



APPENDIX H: LINEAR FILTERS NEED TO BE 
INVERTIBLE TO CONSERVE INFORMATION 

The Fisher information matrix J for a Gaussian distribu- 
tioiF^ with zero mea n and covariance matrix C calculated by 
IVogelev & Szalavl il99h) has the form 



with 



(HI) 



(H2) 



where the comma notation C,i stands for the derivative with respect 
to the parameter 9i: dC/dOi. Following Tegmark ( 1997), we calcu- 
late the Fisher information matrix J for the filtered and un-filtered 
signal. Let us assume a linear filter L, which provides us with an 
estimator of the signal 

(s)l = Ld. (H3) 
The correlation matrix of the estimator yields 

C°=' = {(s)i.{s)l)^s.e\p) = Lt (RSRt + n) L. (H4) 
We get then 

C°f = V (^RS.^R''") L, (H5) 

Gf = L (rSR^ + l''"l''" (^RS,,R^) L, (H6) 

where we have denoted the approximate inverse of L as L. Doing 
the same for the data yields 

(.data ^ (ddt>(s,e|p) = (rSR^+n) , (H7) 

(.data ^ RS,iRl", (H8) 

gdata ^ ^jjgjjt ' (^RS,.Rt) . (H9) 

If we now insert expression JH6t in the Fisher matrix dHlb . we get 
Jtf = itr(GrGf) (HIO) 



LC 



data—ly t-|- f >-,dat. 



data— Ij^tj^l (.dat 



In general, this will differ from the Fisher matrix of the data. If we 
assume, however, that the linear operator is invertible (3L^^), then 
eq. jHl lb reduces to 



J^f = itr fL-'Gf'"Gf*"L) 



(Hll) 



Invoking that the trace of a product of matrices is invariant under 
cyclic permutations, we see that 



1 , / /-.data^data 

-tr G, G 



J 



(HI 2) 



— s, if R is invertible. 



This shows the result that any linear invertible filter conserves in- 
formation, regardless of the parameters that one wants to estimate. 
However, one should be careful with this statement because linear 
filters are, in general, not invertible unless the data and signal space 
have the same dimension, the noise is non-zero for any frequency, 
and the R- and S-matrices are invertible. Usually the data and signal 
space will differ and the R-matrix will not be exactly invertible. 



Here a Gaussian likelihood is assumed, but the result does not rely on 



(Gl) the Gaussianity of the data (see e.g. lSeliakiri998l) . 
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APPENDIX I: JEFFREY'S PRIOR FOR THE 
3-DIMENSIONAL POWER SPECTRUM 

Let us start by assuming a Gaussian likelihoocP^ 

P(S I P.(fe)) rvTT _ ^ p^pf„MMl! 



The log-likelihood is then given by 



log {P{s 1 Ps(fe))) (X ^ [log (Ps(fc)) + 



Ps(fe) 



(II) 



(12) 



We now need the second derivatives of the log-likelihood with re- 
spect to the parameter Ps 



dPsiky 



■lo: 



.g(p(s|Ps(fc))) (X [- 



' : + ^l.(I3) 



P|(fe) P|(fc) 



The next step consists of calculating the Fisher information by per- 
forming the integral 

f dsP{s I Ps(fe)) on the above quantity, which is equivalent to 
performing the following ensemble average (see section |2j2j 

JiPsik)) = „^, log(p(s I Ps(fe)))>(s|p) cx -p^, 



9Ps(fe)' 



(14) 



where we have taken into account that Ps(fc) = (|s(fc)| 
Finally the square-root of the Fisher information leads to Jeffrey's 
prior 

P(Ps(fc)) = ^/J{Psik)) oc Psiky\ (15) 

Following IWandelt et al. we can argue in a more intuitive 

way that P(Ps(fc)) cx Ps(fc)~^ is a solution to a measure invari- 
ant under scale transformations of the form 

P(Ps(fe))dPs(fc) = P{aPs{k))adPs(k) (here we have general- 
ized this result to the 3-dimensional power spectrum). 



APPENDIX J: MEM WITH GAUSSIAN AND POISSONIAN 
LIKELIHOODS 

The quantity to maximize is given by 

Q^(s I p) = aS^'is i p) + log C{s \d,p). (Jl) 

After some calculations we see that the gradient of the entropy 
for PADs is 



\ rrii 



and for positive and negative distributions 

' Wi + Si 



vs^;(s|p), = -iog 



(J2) 



(J3) 



We took into account that dwi/dsj — Si/wiSij. It is then more 
straightforward to calculate the 5*^ curvature for PADs 

VV5^(s' |p) = -diag(s')"\ (J4) 
and for positive and negative distributions, 

VV5'±(s I p) = -diag(u;)-\ (J5) 
Analogously, we calculate the gradient of the log £(s | d) for the 

Note that the likelihood for (fc) is the prior for s. 



Gaussian case valid for positive (s') and positive and negative sig- 
nals (s±) 



Vlog£G(s|d,p). = -ivx'(s) 
and the corresponding curvature 



RTN"'(Rs - d) 



(J6) 



VVlogCcis I d,p) = --VVx^(s) = -R^N-Ir. (J7) 
The Poissonian case leads to 
Vlog£p(s I d,p)j 

= bniJ2 1 + {J2Rkjs',+c,)-'d',)] 

k j 

= 6rT^ [Rt - 1 + diag (Rs' ) + " ' d' ) ] , 

and 

VVlog£p(s I d,p),j 

= -6^%^ ^ [-Rfci(^ Rkis'i + Ck)~^Rkjd' 



— —0 rig- 



n^^ [r^ (^diag (^(Rs') + cj R^d' 



Note that when dealing with over-density fields one should do the 
following substitution: s'i — %(1 + bsi) in the last two expres- 
sions. 

Summing up, we have the following gradient of for PADs 
VQ+(s' |p), = -Qlogf^^ + VlogC{s' I d,p)„ (J8) 
and for positive and negative distributions 

Wi " s 



V(5±(S I p)^ = -«log 



nii 



- + Vlog£(s I d,p)„ (J9) 



and the corresponding curvatures 
VVQ+(s' I p) = -adiag(s')"' + VVlog£(s' | d,p), (JIO) 

VV(3|(s I p) = -Qdiag(w;)"^ + VVlog£(s | d,p). (Jll) 

The corresponding likelihood (Gaussian or Poissonian) has to be 
inserted in each of the expressions for the gradient or curvature 
of Q^. For the choice of a n op timal regularization constant a see 
e.g. 



Maisinger et iD (Il997h and lHobsonet"aD l ll998t) . 



APPENDIX K: KRYLOV METHODS: CONJUGATE 
GRADIENTS 

Kl Orthogonality between the residuals and the searching 
vectors 

Eq. l |93t tells us that each error vector 77-'^^ is A-orthogonal to the 
previous searching vector M/x-* . Since all different searching vec- 
tors M/x' are A-orthogonal to each other by construction, and the 
error vectors are given by the linear combination of the previous 
error vector and the previous searching vector (eq. ll90t). it follows 
that each error vector 77^+^ is A-orthogonal to all previous search- 
ing vectors fi', i.e. for i ^ j. 



Using eq. i91\ we can write eq. jKU as 

(|^+^|M/x')=0, 



(Kl) 



(K2) 
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being i ^ j. 

Applying the inner product between the searching vectors 
M/i' and the recurrent formula for the residuals (eq.l92t. we get 

(|^+'|M/i') = (i^|M/x') -r^(M/x^|M/x')A. (K3) 

For i ^ j this equation reduces to 

(|^+^|Mm") = (^^|M/i'). (K4) 

From eq. iK2l and eq. l lK4t we conclude that for i < j, 

(4^|M/x*> = 0. (K5) 



K2 The set of residuals as a basis of linearly independent 
vectors 

Taking the Gram-Schmidt orthogonalization scheme (eq. 196) and 
multiplying it with the residuals, we obtain 

j-i 

{e\M,j.^) = {e\Me) + j2i3''{e\Mti''). (K6) 

fc=0 

Using the result obtained in the appendix IKl I (eo . l K5t . one shows 
the orthogonality (strictly orthogonal, if M = I) between any dif- 
ferent residuals (for i 7^ j j3 



IM^-'} = 0. 



(K7) 



For i = j by combining iK5\ and ( |K6t we get the relation we used 
in equation l |95t 



(riMM') = (riMr> 



(K8) 



K3 Formulae for the /3-factor 

From the scalar product between eq. i92\ and the residual 

(e^'iMC) = (erne) - t'{m,i'\mc)x, (K9) 

it is clear that the /J-factors are all zero except for one. Notice that 
the denominator in l3, given by (M/^-' |M{*)^ cancels out if neither 
i = j + 1 nor i — j. The latter is excluded according to the def- 
inition of (3 (see eqs.|96|and|98K Gram-Schmidt orthogonalization 
thus simplifies to eq. l |99t , with 



Pexp ^ 



(MMi|M/x^>A • 

Other expressions for this factor can be derived by replacing i 
j + 1 in eq. l [K9l ) 



(KIO) 



(Kll) 



Substituting this expression in eq. J98t and using the formula for 
T-' (eq.l95ll one obtains the Fletcher-Reeves equation 



/3i 



,+1 _ {e+'\Me+'} 



(K12) 



Polak-Ribieres formula can now be obtained trivially by taking ex- 
pression iK7\ into account. Let us do an invariant operation by 
adding — ({-'^^ IM^-*) to the nominator in Fletcher-Reeves formula 

■^^ This result is at first glance only valid for i < j. However, with the 
additional requirement that the matrix M be self-adjoint, the generalization 
toi j is trivial. 



which immediately leads to Polak-Ribieres expression 

,+1 _ (e+^|M(e+^-e)) 



(erne 



(K14) 



In order to get Hestenes-Stiefels formula one has to consider 
eqs. iKSl and l lK5t in the denominator of /3pR 



(K15) 



resulting in the following expression 



(K16) 



Due to the relations derived in this appendix other equivalent for- 
mulae for /3 (summarized in table [2]l can be found, which differ in 
their numerical behavior. Note that from the 16 possible schemes 
presented here, only 3 are discussed in the literature. 



K4 Preconditioned non-linear time step 

The function under consideration is expanded until the second or- 
der around t-'M/x-' according to eq. 



(K17) 



Then the derivative with respect to the searching vector is done to 
find the extremum 

^ (VOa(V'')|M/x^) +r^(MM^|M/xO^^^^^^,^. (K18) 

By setting this equation to zero, one finds an expression for the time 
step 



(V0a(V'')|M/x^) 



(K19) 



Note that the last equation can be rewritten using relation 1 IK8I 1 as 



(K20) 



APPENDIX L: BAYES, TIKHONOV, ASYMPTOTIC 
REGULARIZATION AND LEARNING ALGORITHMS 



We want to solve eq. ( I66t from a Bayesian perspective. Let us as- 
sume a Gaussian likelihood with covariance I 



Ci-iP I =G(/-AV',I), 



(LI) 



which is a fair assumption in the absence of noise (eq. l l66t is equiv- 
alent to eq. ^ without noise, e = 0). Let us further assume a Gaus- 
sian prior around a prior solution ip* with covariance rM 



P(V' I p) = G{ip - ■^*,tM 



(L2) 



We can now calculate the MAP which coincides in this case with 
the mean of the posterior. Let us look at the quantity given by the 
log-posterior PDF 

\\f -xn"" + r\\^-r\\l^, 



(L3) 
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which is a generahzation of Tikhonov regularization. Minimizing 
the negative log-posterior yields the following equation for the 
Bayesian estimator {i/')b 

At(A(V>B - /) + t-^M{Wb - ik*) = 0. (L4) 

If we now choose M = A^M^^ (M is an invertible matrix) we get 

At (M-^(^* - (iA)b) + r{f - A(iA)b)) = 0, (L5) 

This equation will be fulfilled if the following equality holds 

{iA)b =i/'*+rM(/-A(i/>)B). (L6) 

The estimator (i/')b for the solution to the inverse problem 
(eq. l l66t ) is expressed in eq. ( IL6t as the prior solution tp* plus 
a correction term given by the residual / — A('i/>)B. Since only the 
residual based on the prior solution is known, the following sub- 
stitution must be done on the right-hand- side (rhs) {i/')b — > tp* 
leading to 

(V')b ii^A* +rM(/-AV'*). (L7) 

This can be interpreted as an iterative scheme, in which the estima- 
tor is the update j + 1 ((i/))b ^ V'"'^^ on the left-hand-side (Ihs)) 
of the estimator at the previous step j (il>* i/'-' on the rhs) 

Vj^+^ = i/)^' +rM(/ - Ai/)^). (L8) 

In this way, we have found the general iterative method (eq. [77} 
derived with the asymptotic regularization in section J3.1.2t . From 
the Bayesian point of view, this scheme could be interpreted as a 
learning algorithm, in which the estimator of the solution to the 
inverse problem is calculated from the prior solution and becomes 
itself the prior solution for the subsequent iteration. 



and summing over i we get 

J i i 

+ =^ip[i] ~^MAip[i\. (M7) 

1=0 i=0 1=0 

Manipulating the indices, we see that 

+ = ^i/'[j] -lAM- (M8) 

1=0 i=0 

Combining the last two equations we obtain eq. l l77p^ 

1/)^'+^ = +M(/ - Ai/J^), (M9) 

with 

■ip[0] = ip" =Mf. (MIO) 

The meaning of the preconditioning matrix M is clear when we 
look at eq. l lM2t . There it can be seen that a much more rapid con- 
vergence is obtained if (I — MA) is close to zero, that is if M is 
close to the inverse of A. 



APPENDIX M: PRECONDITIONING 

We can enhance the convergence of the iteration methods by mul- 
tiplying the matrix we want to invert by another matrix that is close 
to its inverse 



MAi/) = Mf, 



(Ml) 



with M ~ A^^. Let us show this by deriving eq. illj in a differ- 
ent way. We can invert MA using the Neumann expansion for the 
inverse of an operator 



tp = (MA)"^M/ = ^(I - MA)'M/. 



(M2) 



This iteration scheme will converge if ||I — MA|| < 1. Let us in- 
troduce the following notation 



with 



It follows that 



i=0 

i=0 

i/'W = (l-MAYMf. 
+ = (l-MA)ip[i\, 



(M3) 



(M4) 



(M5) 



(M6) 



The iteration time step r has been absorbed here in the matrix M. 

© 0000 RAS, MNRAS 000, 000-000 



